Results 1  10
of
35
Results of the Abbadingo One DFA Learning Competition and a New EvidenceDriven State Merging Algorithm
, 1998
"... . This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, w ..."
Abstract

Cited by 90 (1 self)
 Add to MetaCart
. This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, which orders state merges according to the amount of evidence in their favor. A second winning algorithm, of Hugues Juille, will be described in a separate paper. Part I: Abbadingo 1 Introduction The Abbadingo One DFA Learning Competition was organized by two of the authors (Lang and Pearlmutter) and consisted of a set of challenge problems posted to the internet and token cash prizes of $1024. The organizers had the following goals:  Promote the development of new and better algorithms.  Encourage learning theorists to implement some of their ideas and gather empirical data concerning their performance on concrete problems which lie beyond proven bounds, particulary in the direction o...
Regular Grammatical Inference from Positive and Negative Samples by Genetic Search: the GIG method
, 1994
"... We recall briefly in this paper the formal theory of regular grammatical inference from positive and negative samples of the language to be learned. We state this problem as a search toward an optimal element in a boolean lattice built from the positive information. We explain how a genetic search t ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
We recall briefly in this paper the formal theory of regular grammatical inference from positive and negative samples of the language to be learned. We state this problem as a search toward an optimal element in a boolean lattice built from the positive information. We explain how a genetic search technique may be applied to this problem and we introduce a new set of genetic operators. In view of limiting the increasing complexity as the sample size grows, we propose a semiincremental procedure. Finally, an experimental protocol to assess the performance of a regular inference technique is detailed and comparative results are given. 1 Introduction Grammatical Inference is an instance of the Inductive Learning problem which can be formulated as the task of discovering common structures in examples which are supposed to be generated by the same process. In this particular case, the examples are sentences defined on a specific alphabet and the common structures are represented by a gram...
Incremental Regular Inference
 Proceedings of the Third ICGI96
, 1996
"... In this paper, we extend the characterization of the search space of regular inference [DMV94] to sequential presentations of learning data. We propose the RPNI2 algorithm, an incremental extension of the RPNI algorithm. We study the convergence and complexities of both algorithms from a theoretical ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
In this paper, we extend the characterization of the search space of regular inference [DMV94] to sequential presentations of learning data. We propose the RPNI2 algorithm, an incremental extension of the RPNI algorithm. We study the convergence and complexities of both algorithms from a theoretical and practical point of view. These results are assessed on the Feldman task. 1 Introduction Regular inference is the problem of learning a regular language from a positive sample, that is, a finite set of strings supposed to be drawn from a target language. Whenever a negative sample, that is, a finite set of strings not belonging to the target language, is also available, the problem may be solved by the RPNI algorithm 1 proposed by Oncina and Garc'ia [OG92] and, independently, by Lang [Lan92]. The RPNI algorithm has been shown to identify in the limit any regular language with polynomial complexity as a function of the positive and negative sample sizes. However, this algorithm requir...
Grammar Inference, Automata Induction, and Language Acquisition
 Handbook of Natural Language Processing
, 2000
"... The natural language learning problem has attracted the attention of researchers for several decades. Computational and formal models of language acquisition have provided some preliminary, yet promising insights of how children learn the language of their community. Further, these formal models als ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The natural language learning problem has attracted the attention of researchers for several decades. Computational and formal models of language acquisition have provided some preliminary, yet promising insights of how children learn the language of their community. Further, these formal models also provide an operational framework for the numerous practical applications of language learning. We will survey some of the key results in formal language learning. In particular, we will discuss the prominent computational approaches for learning different classes of formal languages and discuss how these fit in the broad context of natural language learning.
Learning DFA from Simple Examples
, 1997
"... Efficient learning of DFA is a challenging research problem in grammatical inference. It is known that both exact and approximate (in the PAC sense) identifiability of DFA is hard. Pitt, in his seminal paper posed the following open research problem: "Are DFAPACidentifiable if examples are drawn ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Efficient learning of DFA is a challenging research problem in grammatical inference. It is known that both exact and approximate (in the PAC sense) identifiability of DFA is hard. Pitt, in his seminal paper posed the following open research problem: "Are DFAPACidentifiable if examples are drawn from the uniform distribution, or some other known simple distribution?" [25]. We demonstrate that the class of simple DFA (i.e., DFA whose canonical representations have logarithmic Kolmogorov complexity) is efficiently PAC learnable under the Solomonoff Levin universal distribution. We prove that if the examples are sampled at random according to the universal distribution by a teacher that is knowledgeable about the target concept, the entire class of DFA is efficiently PAC learnable under the universal distribution. Thus, we show that DFA are efficiently learnable under the PACS model [6]. Further, we prove that any concept that is learnable under Gold's model for learning from characteristic samples, Goldman and Mathias' polynomial teachability model, and the model for learning from example based queries is also learnable under the PACS model.
An Incremental Interactive Algorithm for Regular Grammar Inference
 Proceedings of the Third ICGI96
, 1996
"... . We present provably correct interactive algorithms for learning regular grammars from positive examples and membership queries. A structurally complete set of strings from a language L(G) corresponding to a target regular grammar G implicitly specifies a lattice of finite state automata (FSA) wh ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
. We present provably correct interactive algorithms for learning regular grammars from positive examples and membership queries. A structurally complete set of strings from a language L(G) corresponding to a target regular grammar G implicitly specifies a lattice of finite state automata (FSA) which contains a FSA MG corresponding to G. The lattice is compactly represented as a versionspace and MG is identified by searching the versionspace using membership queries. We explore the problem of regular grammar inference in a setting where positive examples are provided intermittently. We provide an incremental version of the algorithm along with a set of sufficient conditions for its convergence. 1 Introduction Regular Grammar Inference [3, 5, 9, 12] is an important machine learning problem with applications in pattern recognition and language acquisition. It is defined as the process of learning an unknown regular grammar (G) given a finite set of positive examples S + , possibly...
Probabilistic DFA Inference using KullbackLeibler Divergence and Minimality
 In Seventeenth International Conference on Machine Learning
, 2000
"... Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive sample of an unknown language. The ALERGIA algorithm is one of the most successful approaches to this problem. In the present work we review this algorithm and explain why its generalization criterion ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive sample of an unknown language. The ALERGIA algorithm is one of the most successful approaches to this problem. In the present work we review this algorithm and explain why its generalization criterion, a state merging operation, is purely local. This characteristic leads to the conclusion that there is no explicit way to bound the divergence between the distribution de ned by the solution and the training set distribution (that is, to control globally the generalization from the training sample). In this paper we present an alternative approach, the MDI algorithm, in which the solution is a probabilistic automaton that trades o minimal divergence from the training sample and minimal size. An e cient computation of the KullbackLeibler divergence between two probabilistic DFAs is described, from which the new learning criterion is derived. Empirical results in the d...
The QSM algorithm and its application to software behavior model induction
 APPLIED ARTIFICIAL INTELLIGENCE
, 2008
"... This article presents a novel application of grammatical inference techniques to the synthesis of behavior models of software systems. This synthesis is used for the elicitation of software requirements. This problem is formulated as a deterministic finitestate automaton induction problem from pos ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
This article presents a novel application of grammatical inference techniques to the synthesis of behavior models of software systems. This synthesis is used for the elicitation of software requirements. This problem is formulated as a deterministic finitestate automaton induction problem from positive and negative scenarios provided by an end user of the softwaretobe. A querydriven state merging (QSM) algorithm is proposed. It extends the Regular Positive and Negative Inference (RPNI) and bluefringe algorithms by allowing membership queries to be submitted to the end user. State merging operations can be further constrained by some prior domain knowledge formulated as fluents, goals, domain properties, and models of external software components. The incorporation of domain knowledge both reduces the number of queries and guarantees that the induced model is consistent with such knowledge. The proposed techniques are implemented in the ISIS tool and practical evaluations on standard requirements engineering test cases and synthetic data illustrate the interest of this approach.
Faster Algorithms for Finding Minimal Consistent DFAs
, 1999
"... We describe exbar, a powerful new algorithm for the exact inference of minimal deterministic automata from given training data. This algorithm achieves the highest performance ever on a set of graded benchmark problems that has been posted by Arlindo Oliveira. In addition, we note that the inexa ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We describe exbar, a powerful new algorithm for the exact inference of minimal deterministic automata from given training data. This algorithm achieves the highest performance ever on a set of graded benchmark problems that has been posted by Arlindo Oliveira. In addition, we note that the inexact program edbeam, which resulted from the Abbadingo DFA learning competition, also does well on this benchmark set, despite being designed for larger problems. 1 Introduction The problem of finding the smallest deterministic finite automaton that is consistent with a given set of positive and negative examples has been proved NPhard by several people, notable Gold, Angluin, and Pitt and Warmuth. Subsequent researchers have adopted several strategies for making progress in the face of these worstcase intractability results: 1. Exploring new learning paradigms including teachers that can select particularly informative training examples or answer queries. 2. Identifying classes of pr...
How considering incompatible state mergings may reduce the DFA induction search tree
 Fourth International Colloquium on Grammatical Inference (ICGI'98
, 1998
"... . A simple and effective method for DFA induction from positive and negative samples is the state merging method. The corresponding search space may be treestructured, considering two subspaces for a given pair of states: the subspace where states are merged and the subspace where states remain ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
. A simple and effective method for DFA induction from positive and negative samples is the state merging method. The corresponding search space may be treestructured, considering two subspaces for a given pair of states: the subspace where states are merged and the subspace where states remain different. Choosing different pairs leads to different sizes of space, due to state mergings dependencies. Thus, ordering the successive choices of these pairs is an important issue. Starting from a constraint characterization of incompatible state mergings, we show that this characterization allows to achieve better choices, i.e. to reduce the size of the search tree. Within this framework, we address the issue of learning the set of all minimal compatible DFA's. We propose a pruning criterion and experiment with several ordering criteria. The prefix order and a new entropy based criterion exhibit the best results in our test sets. keywords: grammatical inference, DFA, constraint...