Results 11  20
of
44
Directed Acyclic Subsequence Graph for multiple texts
, 1999
"... The subsequence matching problem is to decide, for given strings S and T , whether S is a subsequence of T . The string S is called pattern and the string T text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of pattern. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The subsequence matching problem is to decide, for given strings S and T , whether S is a subsequence of T . The string S is called pattern and the string T text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of pattern. For this purpose we build the automaton that accepts all subsequences of given texts. The automaton is called Directed Acyclic Subsequence Graph (DASG) and we present an algorithm for its building. We also prove upper bound for its number of states. Further, we consider modification of the subsequence matching problem: given a string S and a finite language L, we are to decide whether S is a subsequence of any string in L. We suppose that a finite automaton accepting L is given and present an algorithm for building the DASG for the language L. We also mention applications of DASG to some problems related to subsequences. R'esum'e Le probl`eme de recherche de souss'equence consist...
Direct Building of Minimal Automaton for a Given List
"... This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. The size of the temporary automata which are necessary for the construction is less than the size of the resulting minimal automata plus the length of one of the longest words in the list. This property is the main advantage of our method.
Weighted FiniteState Transducer Algorithms: An Overview
 Formal Languages and Applications, volume 148, VIII
, 2004
"... Weighted finitestate transducers are used in many applications such as text, speech and image processing. This chapter gives an overview of several recent weighted transducer algorithms, including composition of weighted transducers, determinization of weighted automata, a weight pushing algorithm, ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Weighted finitestate transducers are used in many applications such as text, speech and image processing. This chapter gives an overview of several recent weighted transducer algorithms, including composition of weighted transducers, determinization of weighted automata, a weight pushing algorithm, and minimization of weighted automata. It briefly describes these algorithms, discusses their running time complexity and conditions of application, and shows examples illustrating their application. 1
Minimizing local automata
"... Abstract — We design an algorithm that minimizes irreducible deterministic local automata by a sequence of state mergings. Two states can be merged if they have exactly the same outputs. The running time of the algorithm is O(min(m(n −r +1), m log n)), where m is the number of edges, n the number of ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract — We design an algorithm that minimizes irreducible deterministic local automata by a sequence of state mergings. Two states can be merged if they have exactly the same outputs. The running time of the algorithm is O(min(m(n −r +1), m log n)), where m is the number of edges, n the number of states of the automaton, and r the number of states of the minimized automaton. In particular, the algorithm is linear when the automaton is already minimal and contrary to Hopcroft’s minimisation algorithm that has a O(kn log n) running time in this case, where k is the size of the alphabet, and that applies only to complete automata. (Note that kn ≥ m.) While Hopcroft’s algorithm relies on a “negative strategy”, starting from a partition with a single class of all states, and partitioning classes when it is discovered that two states cannot belong to the same class, our algorithm relies on a “positive strategy”, starting from the trivial partition for which each class is a singleton. Two classes are then merged when their leaders have the same outputs. The algorithm applies to irreducible deterministic local automata, where all states are considered both initial and final. These automata, also called covers, recognize symbolic dynamical shifts of finite type. They serve to present a large class of constrained channels, the class of finite memory systems, used for channel coding purposes. The algorithm also applies to irreducible deterministic automata that are leftclosing and have a synchronizing word. These automata present shifts that are called almost of finite type. Almostoffinitetype shifts make a meaningful class of shifts, intermediate between finite type shifts and sofic shifts.
Incremental construction of compact acyclic NFAs
 in Proceedings of ACL 2001
, 2001
"... This paper presents and analyzes an incremental algorithm for the construction of Acyclic Nondeterministic Finitestate Automata (NFA). Automata of this type are quite useful in computational linguistics, especially for storing lexicons. The proposed algorithm produces compact NFAs, i.e. NFA ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper presents and analyzes an incremental algorithm for the construction of Acyclic Nondeterministic Finitestate Automata (NFA). Automata of this type are quite useful in computational linguistics, especially for storing lexicons. The proposed algorithm produces compact NFAs, i.e. NFAs that do not contain equivalent states. Unlike Deterministic Finitestate Automata (DFA), this property is not sufficient to ensure minimality, but still the resulting NFAs are considerably smaller than the minimal DFAs for the same languages.
A Taxonomy of Algorithms for Constructing Minimal Acyclic Deterministic Finite Automata
 Proc. Workshop on Implementing Automata
, 1999
"... this paper, we present a taxonomy of algorithms for constructing minimal acyclic deterministic finite automata (MADFAs). MADFAs represent finite languages and are therefore useful in applications such as storing words for spellchecking, computer and biological virus searching, text indexing and XML ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
this paper, we present a taxonomy of algorithms for constructing minimal acyclic deterministic finite automata (MADFAs). MADFAs represent finite languages and are therefore useful in applications such as storing words for spellchecking, computer and biological virus searching, text indexing and XML tag lookup. In such applications, the automata can grow extremely large (with more than 10
Processing Text Files as Is: Pattern Matching over Compressed Texts, Multibyte Character Texts, and Semistructured Texts
 In Proc. 9th International Symposium on String Processing and Information Retrieval (SPIRE’2002), LNCS 2476
, 2002
"... Techniques in processing text files "as is" are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the "asis" principle. Another example is string matching over multibyte ch ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Techniques in processing text files "as is" are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the "asis" principle. Another example is string matching over multibyte character texts, which is a significant problem common to oriental languages such as Japanese, Korean, Chinese, and Taiwanese. A text filef rom such languages is a mixture of singlebyte characters and multibyte characters. Naive solution would be (1) to convert a given text into a fixed length encoded one and then apply any string matching routine to it; or (2) to directly search the text file byte af ter bytef or (the encoding of ) a pattern in which an extra work is neededf or synchronization to avoidf alse detection. Both the solutions, however, sacrifice the searching speed. Our algorithm runs on such a multibyte character text file at the same speed as on an ordinary ASCII text file, withoutf alse detection. The technique is applicable to any prefix code such as the Hu#man code and variants of Unicode. We also generalize the technique so as to handle structured texts such as XML documents. Using this technique, we can avoidfE"9 detectionof keyword even if it is a substring of a tag name or of an attribute description, without any sacrifice of searching speed. 1
THE EQUATIONAL THEORY OF ωTERMS FOR FINITE Rtrivial Semigroups
, 2005
"... A new topological representation for free profinite Rtrivial semigroups in terms of spaces of vertexlabeled complete binary trees is obtained. Such a tree may be naturally folded into a finite automaton if and only if the element it represents is an ωterm. The variety of ωsemigroups generated by ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
A new topological representation for free profinite Rtrivial semigroups in terms of spaces of vertexlabeled complete binary trees is obtained. Such a tree may be naturally folded into a finite automaton if and only if the element it represents is an ωterm. The variety of ωsemigroups generated by all finite Rtrivial semigroups, with the usual interpretation of the ωpower, is then studied. A simple infinite basis of identities is exhibited and a lineartime solution of the word problem for relatively free ωsemigroups is presented. This work is also compared with recent work of Bloom and Choffrut on transfinite words.
How to Squeeze a Lexicon
 Software Practice and Experience
, 2000
"... Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of string sets with fast access time. Creating them with traditional algorithms of DFA minimization is a resource hog when a large number of strings is involved. This paper aims to popularize an efficient b ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of string sets with fast access time. Creating them with traditional algorithms of DFA minimization is a resource hog when a large number of strings is involved. This paper aims to popularize an efficient but little known algorithm for creating minimal ADFAs recognizing a finite language, developed independently by several authors. The algorithm is presented for three variants of ADFAs, its minor improvements are discussed, and minimal ADFAs are compared to competitive data structures.