## Ternary Directed Acyclic Word Graphs (2003)

### Cached

### Download Links

- [www.i.kyushu-u.ac.jp]
- [www.cpe.ku.ac.th]
- [www.i.kyushu-u.ac.jp]
- DBLP

### Other Repositories/Bibliography

### BibTeX

@MISC{Miyamoto03ternarydirected,

author = {Satoru Miyamoto and Shunsuke Inenaga and Masayuki Takeda and Ayumi Shinohara},

title = {Ternary Directed Acyclic Word Graphs},

year = {2003}

}

### OpenURL

### Abstract

Given a set S of strings, a DFA accepting S offers a very time-efficient solution to the pattern matching problem over S. The key is how to implement such a DFA in the trade-off between time and space, and especially the choice of how to implement the transitions of each state is critical. Bentley and Sedgewick proposed an effective tree structure called ternary trees. The idea of ternary trees is to `implant' the process of binary search for transitions into the structure of the trees themselves. This way the process of binary search becomes visible, and the implementation of the trees becomes quite easy. The directed acyclic word graph (DAWG) of a string w is the smallest DFA that accepts all su#xes of w, and requires only linear space. We apply the scheme of ternary trees to DAWGs, introducing a new data structure named ternary DAWGs (TDAWGs). We perform some experiments that show the e#ciency of TDAWGs, compared to DAWGs in which transitions are implemented by tables and linked lists.

### Citations

953 |
Algorithms on strings, trees, and sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...n, Chinese, and so on. We emphasize that the benefit of the ternary-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as su#x trees =-=[16, 12, 14, 10]-=- and compact directed acyclic word graphs (CDAWGs) [5, 9, 11]. Therefore, we can also consider ternary su#x trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary su#x trees ... |

568 |
A space economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ... Chinese, and so on. We emphasize that the benefit of the ternary-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as suffix trees =-=[16, 12, 14, 10]-=- andcompact directed acyclic word graphs (CDAWGs) [5, 9, 11]. Therefore, we can also consider ternary suffix trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary suffix tre... |

445 | Linear pattern matching algorithms
- Weiner
- 1973
(Show Context)
Citation Context ...n, Chinese, and so on. We emphasize that the benefit of the ternary-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as su#x trees =-=[16, 12, 14, 10]-=- and compact directed acyclic word graphs (CDAWGs) [5, 9, 11]. Therefore, we can also consider ternary su#x trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary su#x trees ... |

343 | On-Line Construction of Suffix Trees
- Ukkonen
- 1995
(Show Context)
Citation Context ... Chinese, and so on. We emphasize that the benefit of the ternary-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as suffix trees =-=[16, 12, 14, 10]-=- andcompact directed acyclic word graphs (CDAWGs) [5, 9, 11]. Therefore, we can also consider ternary suffix trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary suffix tre... |

341 |
Text Algorithms
- Crochemore, Rytter
- 1994
(Show Context)
Citation Context ...points to the state in which y is accepted. DAWGs were first introduced by Blumer et al. [4], and have widely been used for solving the substring pattern matching problem, and in various applications =-=[7, 8, 15]-=-. Theorem 1 (Crochemore [6]). For any string w # # # , DAWG(w) is the smallest (partial) DFA that recognizes Su#x (w). Theorem 2 (Blumer et al. [4]). For any string w # # # with |w| > 1, DAWG(w) has a... |

151 | Fast algorithms for sorting and searching strings
- Bentley, Sedgewick
- 1997
(Show Context)
Citation Context ...g for pattern p takes O(|#|s|p|) time in both worst and average cases. It is easy to imagine that this should be a serious disadvantage when searching texts of a large alphabet. Bentley and Sedgewick =-=[3]-=- introduced an e#ective tree structure called ternary search trees (to be simply called ternary trees in this paper), for storing a set of strings. The idea of ternary trees is to `implant' the proces... |

101 |
The smallest automaton recognizing the subwords of a text. Theoret
- Blumer, Blumer, et al.
- 1985
(Show Context)
Citation Context ...er or not p is a substring of w. Clearly, a DFA that recognizes the set of all su#xes of w permits us to solve this problem very quickly. The smallest DFA of this kind was introduced by Blumer et al. =-=[4]-=-, called the directed acyclic word graph (DAWG) of string w, that only requires O(|w|) space. In this paper, we apply the scheme of ternary trees to DAWGs, yielding a new data structure called ternary... |

90 | Transducers and repetitions
- Crochemore
- 1986
(Show Context)
Citation Context ...accepted. DAWGs were first introduced by Blumer et al. [4], and have widely been used for solving the substring pattern matching problem, and in various applications [7, 8, 15]. Theorem 1 (Crochemore =-=[6]-=-). For any string w # # # , DAWG(w) is the smallest (partial) DFA that recognizes Su#x (w). Theorem 2 (Blumer et al. [4]). For any string w # # # with |w| > 1, DAWG(w) has at most 2|w| - 1 states and ... |

60 |
A space-economical sux tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...n, Chinese, and so on. We emphasize that the benefit of the ternary-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as su#x trees =-=[16, 12, 14, 10]-=- and compact directed acyclic word graphs (CDAWGs) [5, 9, 11]. Therefore, we can also consider ternary su#x trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary su#x trees ... |

60 | Complete inverted files for efficient text retrieval and analysis
- Blumer, Blumer, et al.
- 1987
(Show Context)
Citation Context ...based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as suffix trees [16, 12, 14, 10] andcompact directed acyclic word graphs (CDAWGs) =-=[5, 9, 11]-=-. Therefore, we can also consider ternary suffix trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary suffix trees and ternary CDAWGs are promising to perform very well in ... |

55 |
Minimisation of acyclic deterministic automata in linear time
- Revuz
- 1992
(Show Context)
Citation Context ...2|w| - 1 states and 3|w| - 3 transitions. It is a trivial fact that DAWG(w) can be constructed in time proportional to the number of transitions in STrie(w) by the DAG-minimization algorithm by Revuz =-=[13]-=-. However, the number of transitions of STrie(w) is unfortunately quadratic in |w|. The direct construction of DAWG(w) in linear time is therefore significant, in order to avoid creating redundant sta... |

30 |
On-line construction of sux trees
- Ukkonen
- 1995
(Show Context)
Citation Context |

20 |
Approximate string matching with suffix automata. Algorithmica 10, 353–364. Preliminary version in Rep
- UKKONEN, WOOD
- 1993
(Show Context)
Citation Context ...points to the state in which y is accepted. DAWGs were first introduced by Blumer et al. [4], and have widely been used for solving the substring pattern matching problem, and in various applications =-=[7, 8, 15]-=-. Theorem 1 (Crochemore [6]). For any string w ∈ Σ ∗ ,DAWG(w) is the smallest (partial) DFA that recognizes Suffix(w). Theorem 2 (Blumer et al. [4]). For any string w ∈ Σ ∗ with |w| > 1, DAWG(w) has a... |

19 |
An algorithm for the organisation of information
- Adel’son-Vel’skii, Landis
- 1962
(Show Context)
Citation Context ...and is not implementation-friendly. Further, it does not allow us to update the input text sting since it is just an o#-line algorithm. However, as for TDAWGs, we can apply the technique of AVL trees =-=[1] for balan-=-cing the states of TDAWGs. In this scheme, the states for CharSetw (x) for any substring x are "AVL-balanced", and thus the time to search for pattern p is O(log |#|s|p|) even in the worst c... |

17 | On compact directed acyclic word graphs
- Crochemore, Vérin
- 1997
(Show Context)
Citation Context ... basic automaton of this kind is the su#x trie. The su#x trie of a string w # # # is denoted by STrie(w). What is obtained by minimizing STrie(w) is called the directed acyclic word graph (DAWG) of w =-=[9]-=-, denoted by DAWG(w). In Fig. 1 we show STrie(w) and DAWG(w) with w = cocoa. The initial state of DAWG(w) is also called the source state, and the state accepting w is called the sink state of DAWG(w)... |

16 | On-Line Construction of Compact Directed Acyclic Word Graphs
- Inenaga, Hoshino, et al.
(Show Context)
Citation Context ...-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as su#x trees [16, 12, 14, 10] and compact directed acyclic word graphs (CDAWGs) =-=[5, 9, 11]-=-. Therefore, we can also consider ternary su#x trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary su#x trees and ternary CDAWGs are promising to perform very well in prac... |

14 |
Tenary Search Trees
- Bentley, Sedgewick
- 1998
(Show Context)
Citation Context ...set S of strings, constructs its ternary tree in O(|#|s#S#) time with O(#S#) space, where #S# denotes the total length of the strings in S. They also showed several nice applications of ternary trees =-=[2]-=-. We in this paper consider the most fundamental pattern matching problem on strings, the substring pattern matching problem, which is described as follows: Given a text string w and pattern string p,... |

10 |
A.: Complete inverted files for e#cient text retrieval and analysis
- Blumer, Blumer, et al.
- 1987
(Show Context)
Citation Context ...-based implementation is not limited to DAWGs. Namely, it can be applied to any automata-oriented index structure such as su#x trees [16, 12, 14, 10] and compact directed acyclic word graphs (CDAWGs) =-=[5, 9, 11]-=-. Therefore, we can also consider ternary su#x trees and ternary CDAWGs. Concerning the experimental results on TDAWGs, ternary su#x trees and ternary CDAWGs are promising to perform very well in prac... |

3 |
Approximate string matching with su#x automata
- Ukkonen, Wood
- 1993
(Show Context)
Citation Context ...points to the state in which y is accepted. DAWGs were first introduced by Blumer et al. [4], and have widely been used for solving the substring pattern matching problem, and in various applications =-=[7, 8, 15]-=-. Theorem 1 (Crochemore [6]). For any string w # # # , DAWG(w) is the smallest (partial) DFA that recognizes Su#x (w). Theorem 2 (Blumer et al. [4]). For any string w # # # with |w| > 1, DAWG(w) has a... |