## Algorithmics and Applications of Tree and Graph Searching (2002)

Venue: | In Symposium on Principles of Database Systems |

Citations: | 109 - 8 self |

### BibTeX

@INPROCEEDINGS{Shasha02algorithmicsand,

author = {Dennis Shasha and Jason T. L. Wang and Rosalba Giugno},

title = {Algorithmics and Applications of Tree and Graph Searching},

booktitle = {In Symposium on Principles of Database Systems},

year = {2002},

pages = {39--52}

}

### Years of Citing Articles

### OpenURL

### Abstract

Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in next-generation database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.

### Citations

10921 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
(Show Context)
Citation Context ...a subisomorphism between Ga and G b (Figure 12). The complexity of such an algorithm is exponential, but it is the best known algorithm|the problem of subgraph isomorphism is proven to be NP-complete =-=[42]-=-. Figure 12: All the maps between Ga and G b . The leaves in the rectangular frames correspond to subisomorphisms between Ga and G b . There have been many attempts to reduce the combinatorial cost of... |

666 | The Lorel query language for semistructured data
- Abiteboul, Quass, et al.
- 1997
(Show Context)
Citation Context ...1. INTRODUCTION Next-generation database systems dealing with XML, Web, network directories and structured documents often model the data as trees and graphs. These data modeling eorts include Lorel [=-=3]-=-, StruQL [38], and UnQL [17, 19], for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approach... |

644 | Suffix arrays: a new method for on-line string searches
- Manber, Myers
- 1990
(Show Context)
Citation Context ...rray Based Algorithm The pathx algorithm works in two phases. In thesrst phase, the database building phase, the algorithm encodes each root-to-leaf path of every data tree into a sux array database [=-=66]-=-. In the second phase, the on-line search phase in which the query tree Q is given, the algorithm compares Q with each data tree D in the database D allowing a differencesDIFF , i.e. at most DIFF path... |

557 | Principles of artificial intelligence - Nilsson - 1982 |

521 |
Data on the web: from relations to semistructured data and XML
- Abiteboul, Buneman, et al.
- 2000
(Show Context)
Citation Context ...], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees [8, 9, 40, 45, 53, 78] and for querying graphs =-=[2, 3, 16, 29, 46, 47, 70, 71-=-]. Besides applications over XML data, these algorithms have applications to scientic databases where data are naturally represented by trees (such as phylogeny) and graphs (such as molecular database... |

505 | DataGuide: enabling query formulation and optimization in semistructured databases
- Goldman, Widom
- 1997
(Show Context)
Citation Context ...rocessing the path expressions in the query through the database. Several systems for querying and indexing graph databases have been implemented|both general-purpose [30, 46] and application-specic [=-=44, 55, 73-=-]. The underlying techniques are described in the next section. 3.1 Keygraph Searching in Graph Databases Cook et al. [30, 35] applied an improvement of the inexact graph matching method (algorithm A ... |

492 | Querying semi-structured data
- Abiteboul
- 1997
(Show Context)
Citation Context ... graph traversals. Goldman, Widom [44] and colleagues [77] proposed a system, called Lore, to store and query a semistructured database (which is modeled as a large rooted labeled directed graph; see =-=[1, 88, 92]-=- for a survey). Lore uses four kinds of indices to accelerate (regular) path expression searching. For each edge label l in the graph, a value index (Vindex) is used to index all the nodes that have i... |

485 |
Information retrieval: Data structures and algorithms. Eaglewood Cliffs
- Frakes, Baeza-Yatex
- 1992
(Show Context)
Citation Context ...he descendants of Mary who is a child of John. Since paths are represented as strings, existing algorithms and tools such as AGrep for string searching are applicable to processing these queries (see =-=[10, 11, 12, 41]-=- for a review). XISS [64] is a XML indexing and querying system designed to support regular path expressions. In each XML tree, each node is associated with a pair of integers enabling the determinati... |

380 | A query language and optimization techniques for unstructured data
- Buneman, Davidson, et al.
- 1996
(Show Context)
Citation Context ...tion database systems dealing with XML, Web, network directories and structured documents often model the data as trees and graphs. These data modeling eorts include Lorel [3], StruQL [38], and UnQL [=-=17, 19]-=-, for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees [8, 9, 40,... |

318 |
Fast text search allowing errors
- Manber, Wu
- 1992
(Show Context)
Citation Context ...lds a set of candidate trees to look for. 2.3.4 Implementation The search andsltering algorithms just described are collectively referred to as ATreeGrep (whose name is shamelessly adapted from AGrep =-=[103]-=- for approximate string searching and SGrep [51] for structure grep). We have implemented ATreeGrep in a XML search engine, called XML Query by Example (XML QBE), which takes an example XML fragment (... |

301 | Efficient filtering of XML documents for selective dissemination of information - Altinel, Franklin - 2000 |

290 | Indexing and querying XML data for regular path expressions
- Li, Moon
- 2001
(Show Context)
Citation Context ... of John. Since paths are represented as strings, existing algorithms and tools such as AGrep for string searching are applicable to processing these queries (see [10, 11, 12, 41] for a review). XISS =-=[64]-=- is a XML indexing and querying system designed to support regular path expressions. In each XML tree, each node is associated with a pair of integers enabling the determination of ancestor-descendant... |

285 | Index structures for path expressions
- Milo, Suciu
- 1999
(Show Context)
Citation Context ...ts. Their algorithms are useful for query optimization since the size of a query tree aects the eciency of tree pattern matching. 2.2 Path-Only Searches Many AC queries are concerned with paths only [=-=4, 18, 73]-=-, e.g.snd the descendants of Mary who is a child of John. Since paths are represented as strings, existing algorithms and tools such as AGrep for string searching are applicable to processing these qu... |

283 | A graduated assignement algorithm for graph matching
- Gold, Rangarajan
- 1996
(Show Context)
Citation Context ...been many attempts to reduce the combinatorial cost of AC query processing in graphs or keygraph searching. They can be classied as approximate, inexact, and exact algorithms. Approximate algorithms [=-=6, 27, 37, 43, 91, 101]-=- have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity [14, 48, 50, 6... |

251 | Constraints for semi-structured data
- Buneman, Fan, et al.
- 2001
(Show Context)
Citation Context ...], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees [8, 9, 40, 45, 53, 78] and for querying graphs =-=[2, 3, 16, 29, 46, 47, 70, 71-=-]. Besides applications over XML data, these algorithms have applications to scientic databases where data are naturally represented by trees (such as phylogeny) and graphs (such as molecular database... |

239 | Y.,(2002) “Structural Joins: A Primitive For Efficient XML Query Pattern Matching - Al-Khalifa, Koudas, et al. |

202 | An Algorithm for Subgraph Isomorphism
- Ullmann
(Show Context)
Citation Context ...8]. The most popular exact (and inexact) subgraph matching algorithms are based on heuristics on the state-space representation tree that corresponds to a subisomorphism. Ullmann's Algorithm. Ullmann =-=[90-=-] presented an algorithm for an exact subgraph matching based on the state space search with backtracking algorithm in [32]. A depthrst search on the state space tree representation depicts the algori... |

189 | Catching the boat with Strudel: Experiences with a web-site management system
- Fernandez, Florescu, et al.
- 1998
(Show Context)
Citation Context ...ION Next-generation database systems dealing with XML, Web, network directories and structured documents often model the data as trees and graphs. These data modeling eorts include Lorel [3], StruQL [=-=38]-=-, and UnQL [17, 19], for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for queryi... |

185 | Query optimization for XML
- McHugh, Widom
(Show Context)
Citation Context ...e algorithm is exponential though the algorithm may run much faster depending on the data. 2.4.2 Selectivity Estimation One technique forsltering trees out faster is to use selectivity estimation. In =-=[69-=-] McHugh and Widom describe Lorel's cost-based query optimizer, which maintains statistics about subpaths of length k, and uses it to infer selectivity estimates of longer path queries. Krishnan et a... |

168 | Structural matching in computer vision using probabilistic relaxation
- Christmas, Kittler, et al.
- 1995
(Show Context)
Citation Context ...been many attempts to reduce the combinatorial cost of AC query processing in graphs or keygraph searching. They can be classied as approximate, inexact, and exact algorithms. Approximate algorithms [=-=6, 27, 37, 43, 91, 101]-=- have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity [14, 48, 50, 6... |

166 | GraphLog: a visual formalism for real life recursion
- Consens, Mendelzon
(Show Context)
Citation Context ...], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees [8, 9, 40, 45, 53, 78] and for querying graphs =-=[2, 3, 16, 29, 46, 47, 70, 71-=-]. Besides applications over XML data, these algorithms have applications to scientic databases where data are naturally represented by trees (such as phylogeny) and graphs (such as molecular database... |

152 | Substructure discovery using minimum description length and background knowledge
- Cook, Holder
- 1994
(Show Context)
Citation Context ...e set of paths that result from processing the path expressions in the query through the database. Several systems for querying and indexing graph databases have been implemented|both general-purpose =-=[30, 46-=-] and application-specic [44, 55, 73]. The underlying techniques are described in the next section. 3.1 Keygraph Searching in Graph Databases Cook et al. [30, 35] applied an improvement of the inexact... |

148 | Regular path queries with constraints
- Abiteboul, Vianu
- 1999
(Show Context)
Citation Context ...ts. Their algorithms are useful for query optimization since the size of a query tree aects the eciency of tree pattern matching. 2.2 Path-Only Searches Many AC queries are concerned with paths only [=-=4, 18, 73]-=-, e.g.snd the descendants of Mary who is a child of John. Since paths are represented as strings, existing algorithms and tools such as AGrep for string searching are applicable to processing these qu... |

148 |
An eigendecomposition approach to weighted graph matching problems
- Umeyama
- 1988
(Show Context)
Citation Context ...been many attempts to reduce the combinatorial cost of AC query processing in graphs or keygraph searching. They can be classied as approximate, inexact, and exact algorithms. Approximate algorithms [=-=6, 27, 37, 43, 91, 101]-=- have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity [14, 48, 50, 6... |

142 | Change detection in hierarchically structured information
- Chawathe, Rajaraman, et al.
- 1996
(Show Context)
Citation Context ... or approximately appears, in a data tree. Here, the \approximation" is measured by the number of paths in the query tree that do not appear in the data tree [84], or by some other distance funct=-=ions [22, 23, 24, 85, 96, 104, 107]. The q-=-uery tree may contain don't cares or wildcards [72]. There aresxed length don't cares (FLDCs), \?", that may match a single node and variable length don't cares (VLDCs), \" [84]. We shall re... |

137 |
M.J.: Pattern matching in trees
- Hoffmann, O’Donnell
- 1982
(Show Context)
Citation Context ...\" matches a path of the data tree. This querysnds all the phylogenetic trees in TreeBASE that contain the query tree. 2.4 Related Approaches 2.4.1 Approximate Embedding Queries Homan and O'Donne=-=ll [49]-=-, and later Ramesh and Ramakrishnan [81], and Cole et al. [28] presented algorithms forsnding the occurrences of a wildcard-free ordered query tree Q in an ordered data tree D. (In an ordered tree, th... |

133 |
A distance measure between attributed relational graphs for pattern recognition
- Sanfeliu, Fu
- 1983
(Show Context)
Citation Context ..., 43, 91, 101] have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity =-=[14, 48, 50, 63, 74, 79, 82, 102-=-]. Inexact algorithms employ error correction techniques for a noisy data graph. These algorithms employ a cost function to measure the similarity of the graphs. For example, a cost function may be de... |

123 | Minimization of tree pattern queries
- Amer-Yahia, Cho, et al.
- 2001
(Show Context)
Citation Context ...L [17, 19], for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees =-=[8, 9, 40, 45, 53, 78-=-] and for querying graphs [2, 3, 16, 29, 46, 47, 70, 71]. Besides applications over XML data, these algorithms have applications to scientic databases where data are naturally represented by trees (su... |

118 |
On the Editing Distance between Unordered Labeled Trees
- Zhang, Statman, et al.
- 1992
(Show Context)
Citation Context ... or approximately appears, in a data tree. Here, the \approximation" is measured by the number of paths in the query tree that do not appear in the data tree [84], or by some other distance funct=-=ions [22, 23, 24, 85, 96, 104, 107]. The q-=-uery tree may contain don't cares or wildcards [72]. There aresxed length don't cares (FLDCs), \?", that may match a single node and variable length don't cares (VLDCs), \" [84]. We shall re... |

116 | Finding frequent substructures in chemical compounds
- Dehaspe, Toivonen, et al.
- 1998
(Show Context)
Citation Context ...inexact matching. Develop a framework for selectivity estimation for queries on trees and graphs with wildcards. Develop a framework for turning searching to pattern discovery in trees and graphs [3=-=3, 94, 95, 100-=-]. Develop support for semantic extensions: semi- exible orsexible queries [56] in which parent-child relationships in queries may become ancestor-descendant or even descendant-ancestor relationships... |

115 | Meaningful change detection in structured data
- Chawathe, Garcia-Molina
- 1997
(Show Context)
Citation Context ... or approximately appears, in a data tree. Here, the \approximation" is measured by the number of paths in the query tree that do not appear in the data tree [84], or by some other distance funct=-=ions [22, 23, 24, 85, 96, 104, 107]. The q-=-uery tree may contain don't cares or wildcards [72]. There aresxed length don't cares (FLDCs), \?", that may match a single node and variable length don't cares (VLDCs), \" [84]. We shall re... |

112 | Finding regular simple paths in graph databases
- Mendelzon, Wood
- 1995
(Show Context)
Citation Context |

107 | UnQL: A query language and algebra for semistructured data based on structural recursion
- Buneman, Fernandez, et al.
(Show Context)
Citation Context ...tion database systems dealing with XML, Web, network directories and structured documents often model the data as trees and graphs. These data modeling eorts include Lorel [3], StruQL [38], and UnQL [=-=17, 19]-=-, for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [45] for structured documents. There have been several proposed approaches for querying trees [8, 9, 40,... |

102 | Gucht. A graph-oriented object database model for database end-user interfaces
- Gyssens, Paredaens, et al.
- 1990
(Show Context)
Citation Context |

90 | Stereo correspondence through feature grouping and maximal clique
- Horaud, Skordas
- 1989
(Show Context)
Citation Context ..., 43, 91, 101] have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity =-=[14, 48, 50, 63, 74, 79, 82, 102-=-]. Inexact algorithms employ error correction techniques for a noisy data graph. These algorithms employ a cost function to measure the similarity of the graphs. For example, a cost function may be de... |

86 | Representative objects: concise representations of semistructured, hierarchial data
- Nestorov, Ullman, et al.
- 1997
(Show Context)
Citation Context ...ds. For most queries, the matching is implemented using application specic techniques. However queries including wildcards may require exhaustive graph traversals. Goldman, Widom [44] and colleagues [=-=77]-=- proposed a system, called Lore, to store and query a semistructured database (which is modeled as a large rooted labeled directed graph; see [1, 88, 92] for a survey). Lore uses four kinds of indices... |

83 | Typechecking for semistructured data
- Suciu
- 2001
(Show Context)
Citation Context ... graph traversals. Goldman, Widom [44] and colleagues [77] proposed a system, called Lore, to store and query a semistructured database (which is modeled as a large rooted labeled directed graph; see =-=[1, 88, 92]-=- for a survey). Lore uses four kinds of indices to accelerate (regular) path expression searching. For each edge label l in the graph, a value index (Vindex) is used to index all the nodes that have i... |

81 | Mind your grammar: a new approach to modelling text
- GONNET, TOMPA
- 1987
(Show Context)
Citation Context ...e data as trees and graphs. These data modeling eorts include Lorel [3], StruQL [38], and UnQL [17, 19], for semistructured data, XQuery [15], XML-QL [34], XPath [72] and XSL [67], for XML data, and [=-=45]-=- for structured documents. There have been several proposed approaches for querying trees [8, 9, 40, 45, 53, 78] and for querying graphs [2, 3, 16, 29, 46, 47, 70, 71]. Besides applications over XML d... |

79 | Representing and querying changes in semistructured data
- Chawathe, Abiteboul, et al.
- 1998
(Show Context)
Citation Context |

78 |
Inexact graph matching for structural pattern recognition
- Bunke, Allermann
- 1982
(Show Context)
Citation Context ...in Graph Databases Cook et al. [30, 35] applied an improvement of the inexact graph matching method (algorithm A ) described by Nilsson [79] based on an inexact graph matching algorithm proposed in [=-=21]-=- tosnd similar repetitive subgraphs in a single-graph database. Thus, their methods are primarily of interest for the third step above. Their system, SUBDUE, has been applied to discovery and search f... |

74 | A web Odyssey: from Codd to XML
- Vianu
(Show Context)
Citation Context ... graph traversals. Goldman, Widom [44] and colleagues [77] proposed a system, called Lore, to store and query a semistructured database (which is modeled as a large rooted labeled directed graph; see =-=[1, 88, 92]-=- for a survey). Lore uses four kinds of indices to accelerate (regular) path expression searching. For each edge label l in the graph, a value index (Vindex) is used to index all the nodes that have i... |

72 | Tree matching problems with applications to structured text databases
- Kilpeläinen
- 1992
(Show Context)
Citation Context ...p is preserved in the data tree. Figure 5 shows this: the matching data tree has a \Name" node that is missing from the query tree. This type of embedding is also known as tree inclusion as dened=-= in [57, 58, 59, 60], whe-=-re Kilpelainen and Mannila showed the problem to be NP-complete. The AE queries complement the AC queries described in Section 2.1. The notion of \approximation" can be further generalized by int... |

70 |
Error-correcting isomorphism of attributed relational graphs for pattern analysis
- Tsai, Fu
- 1979
(Show Context)
Citation Context ...be dened based on semantic or syntactic transformations to transform one graph into another. (Of course, approximate algorithms can also be used for noisy data graphs.) Relevant work can be found in [=-=20, 21, 31, 36, 39, 65, 75, 89, 97, 98]-=-. The most popular exact (and inexact) subgraph matching algorithms are based on heuristics on the state-space representation tree that corresponds to a subisomorphism. Ullmann's Algorithm. Ullmann [9... |

65 | Counting Twig Matches in a Tree
- Chen, Jagadish, et al.
- 2001
(Show Context)
Citation Context ...y strings containing wildcards, i.e. estimating the number of strings in a database that contain a given query string with wildcards. Other relevant work can be found in [26, 52, 54, 99]. Chen et al. =-=[25-=-] generalized the selectivity estimation problem for unordered trees. Specically, given a data tree D and a wildcard-free query tree Q, which the authors called a twig, they estimate the total number ... |

63 | Path constraints in semistructured and structured databases
- Buneman, Fan, et al.
- 1998
(Show Context)
Citation Context ...ts. Their algorithms are useful for query optimization since the size of a query tree aects the eciency of tree pattern matching. 2.2 Path-Only Searches Many AC queries are concerned with paths only [=-=4, 18, 73]-=-, e.g.snd the descendants of Mary who is a child of John. Since paths are represented as strings, existing algorithms and tools such as AGrep for string searching are applicable to processing these qu... |

62 | A system for approximate tree matching
- Wang, Zhang, et al.
- 1994
(Show Context)
Citation Context ...o are ancestors of Alex and also descendants of Mary. This query could be expressed by a tree pattern, as shown in Figure 2(b). The node \" in the tree pattern is a variable length don't care (VL=-=DC) [93, 106], wh-=-ich would be instantiated into (matched with) a path of nodes of a data tree at no cost. In our example, the nodes in the family tree matched by the VLDC \" (here, Bill and Adam) would be returne... |

59 |
Subgraph isomorphism, matching relational structures, and maximal cliques
- Barrow, Burstall
- 1976
(Show Context)
Citation Context ..., 43, 91, 101] have polynomial complexity but they are not guaranteed tosnd a correct solution. Exact and inexact algorithms dosnd correct answers and therefore have exponential worst-case complexity =-=[14, 48, 50, 63, 74, 79, 82, 102-=-]. Inexact algorithms employ error correction techniques for a noisy data graph. These algorithms employ a cost function to measure the similarity of the graphs. For example, a cost function may be de... |

58 |
Ordered and unordered tree inclusion
- Kilpeläinen, Mannila
- 1995
(Show Context)
Citation Context ...p is preserved in the data tree. Figure 5 shows this: the matching data tree has a \Name" node that is missing from the query tree. This type of embedding is also known as tree inclusion as dened=-= in [57, 58, 59, 60], whe-=-re Kilpelainen and Mannila showed the problem to be NP-complete. The AE queries complement the AC queries described in Section 2.1. The notion of \approximation" can be further generalized by int... |

57 | A linear programming approach for the weighted graph matching problem
- Almohamad, Duffuaa
- 1993
(Show Context)
Citation Context |

57 |
An efficient algorithm for graph isomorphism
- Corneil, Gotlieb
- 1970
(Show Context)
Citation Context ...ation tree that corresponds to a subisomorphism. Ullmann's Algorithm. Ullmann [90] presented an algorithm for an exact subgraph matching based on the state space search with backtracking algorithm in =-=[32-=-]. A depthrst search on the state space tree representation depicts the algorithm's progress. When a node (a pair of matching vertices) is added to the tree, the isomorphism conditions are checked in ... |