Results 1  10
of
79
Results of the Abbadingo One DFA Learning Competition and a New EvidenceDriven State Merging Algorithm
, 1998
"... . This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, w ..."
Abstract

Cited by 90 (1 self)
 Add to MetaCart
. This paper first describes the structure and results of the Abbadingo One DFA Learning Competition. The competition was designed to encourage work on algorithms that scale wellboth to larger DFAs and to sparser training data. We then describe and discuss the winning algorithm of Rodney Price, which orders state merges according to the amount of evidence in their favor. A second winning algorithm, of Hugues Juille, will be described in a separate paper. Part I: Abbadingo 1 Introduction The Abbadingo One DFA Learning Competition was organized by two of the authors (Lang and Pearlmutter) and consisted of a set of challenge problems posted to the internet and token cash prizes of $1024. The organizers had the following goals:  Promote the development of new and better algorithms.  Encourage learning theorists to implement some of their ideas and gather empirical data concerning their performance on concrete problems which lie beyond proven bounds, particulary in the direction o...
What is the Search Space of the Regular Inference?
 In Proceedings of the Second International Colloquium on Grammatical Inference (ICGI'94
, 1994
"... This paper revisits the theory of regular inference, in particular by extending the definition of structural completeness of a positive sample and by demonstrating two basic theorems. This framework enables to state the regular inference problem as a search through a boolean lattice built from the p ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
This paper revisits the theory of regular inference, in particular by extending the definition of structural completeness of a positive sample and by demonstrating two basic theorems. This framework enables to state the regular inference problem as a search through a boolean lattice built from the positive sample. Several properties of the search space are studied and generalization criteria are discussed. In this framework, the concept of border set is introduced, that is the set of the most general solutions excluding a negative sample. Finally, the complexity of regular language identification from both a theoritical and a practical point of view is discussed. 1 Introduction Regular inference is the process of learning a regular language from a set of examples, consisting of a positive sample, i.e. a finite subset of a regular language. A negative sample, i.e. a finite set of strings not belonging to this language, may also be available. This problem has been studied as early as th...
Regular Grammatical Inference from Positive and Negative Samples by Genetic Search: the GIG method
, 1994
"... We recall briefly in this paper the formal theory of regular grammatical inference from positive and negative samples of the language to be learned. We state this problem as a search toward an optimal element in a boolean lattice built from the positive information. We explain how a genetic search t ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
We recall briefly in this paper the formal theory of regular grammatical inference from positive and negative samples of the language to be learned. We state this problem as a search toward an optimal element in a boolean lattice built from the positive information. We explain how a genetic search technique may be applied to this problem and we introduce a new set of genetic operators. In view of limiting the increasing complexity as the sample size grows, we propose a semiincremental procedure. Finally, an experimental protocol to assess the performance of a regular inference technique is detailed and comparative results are given. 1 Introduction Grammatical Inference is an instance of the Inductive Learning problem which can be formulated as the task of discovering common structures in examples which are supposed to be generated by the same process. In this particular case, the examples are sentences defined on a specific alphabet and the common structures are represented by a gram...
The skstrings method for inferring PFSA
 In Proceedings of the
, 1997
"... We describe a simple, fast and easy to implement recursive algorithm with four alternate intuitive heuristics for inferring Probabilistic Finite State Automata. The algorithm is an extension for stochastic machines of the ktails method introduced in 1972 by Biermann and Feldman for nonstochastic m ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We describe a simple, fast and easy to implement recursive algorithm with four alternate intuitive heuristics for inferring Probabilistic Finite State Automata. The algorithm is an extension for stochastic machines of the ktails method introduced in 1972 by Biermann and Feldman for nonstochastic machines. Experiments comparing the two are done and benchmark results are also presented. It is also shown that skstrings performs better than ktails at least in inferring small automata. Introduction When given a finite number of examples of the behaviour of a probabilistic state determined machine, it is possible to imagine methods by which we can infer its structure. Ideally, we would like to identify the exact automaton which generated the strings. But it is impossible to do this from the behaviour of the machine because more than one nonminimal machine may generate the same language. This paper is concerned not with identifing the generating machine, which is demonstratably impossib...
Incremental Regular Inference
 Proceedings of the Third ICGI96
, 1996
"... In this paper, we extend the characterization of the search space of regular inference [DMV94] to sequential presentations of learning data. We propose the RPNI2 algorithm, an incremental extension of the RPNI algorithm. We study the convergence and complexities of both algorithms from a theoretical ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
In this paper, we extend the characterization of the search space of regular inference [DMV94] to sequential presentations of learning data. We propose the RPNI2 algorithm, an incremental extension of the RPNI algorithm. We study the convergence and complexities of both algorithms from a theoretical and practical point of view. These results are assessed on the Feldman task. 1 Introduction Regular inference is the problem of learning a regular language from a positive sample, that is, a finite set of strings supposed to be drawn from a target language. Whenever a negative sample, that is, a finite set of strings not belonging to the target language, is also available, the problem may be solved by the RPNI algorithm 1 proposed by Oncina and Garc'ia [OG92] and, independently, by Lang [Lan92]. The RPNI algorithm has been shown to identify in the limit any regular language with polynomial complexity as a function of the positive and negative sample sizes. However, this algorithm requir...
From dirt to shovels: Fully automatic tool generation from ad hoc data
 In POPL
, 2008
"... An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular bas ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. In this paper, we demonstrate that it is possible to generate a suite of useful data processing tools, including a semistructured query engine, several format converters, a statistical analyzer and data visualization routines directly from the ad hoc data itself, without any human intervention. The key technical contribution of the work is a multiphase algorithm that automatically infers the structure of an ad hoc data source and produces a format specification in the PADS data description language. Programmers wishing to implement custom data analysis tools can use such descriptions to generate printing and parsing libraries for the data. Alternatively, our software infrastructure will push these descriptions through the PADS compiler and automatically generate fully functional tools. We evaluate the performance of our inference algorithm, showing it scales linearly in the size of the training data — completing in seconds, as opposed to the hours or days it takes to write a description by hand. We also evaluate the correctness of the algorithm, demonstrating that generating accurate descriptions often requires less than 5 % of the available data. 1.
Regular Model Checking Using Inference of Regular Languages
, 2004
"... Regular model checking is a method for verifying infinitestate systems based on coding their configurations as words over a finite alphabet, sets of configurations as finite automata, and transitions as finite transducers. We introduce a new general approach to regular model checking based on infer ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Regular model checking is a method for verifying infinitestate systems based on coding their configurations as words over a finite alphabet, sets of configurations as finite automata, and transitions as finite transducers. We introduce a new general approach to regular model checking based on inference of regular languages. The method builds upon the observation that for infinitestate systems whose behaviour can be modelled using lengthpreserving transducers, there is a finite computation for obtaining all reachable configurations up to a certain length n. These configurations are a (positive) sample of the reachable configurations of the given system, whereas all other words up to length n are a negative sample. Then, methods of inference of regular languages can be used to generalize the sample to the full reachability set (or an overapproximation of it). We have implemented our method in a prototype tool which shows that our approach is competitive on a number of concrete examples. Furthermore, in contrast to all other existing regular model checking methods, termination is guaranteed in general for all systems with regular sets of reachable configurations. The method can be applied in a similar way to dealing with reachability relations instead of reachability sets too.
Generating Annotated Behavior Models From EndUser Scenarios
 IEEE Transactions on Software Engineering
, 2005
"... Requirementsrelated scenarios capture typical examples of system behaviors through sequences of desired interactions between the softwaretobe and its environment. Their concrete, narrative style of expression makes them very effective for eliciting software requirements and for validating behavio ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Requirementsrelated scenarios capture typical examples of system behaviors through sequences of desired interactions between the softwaretobe and its environment. Their concrete, narrative style of expression makes them very effective for eliciting software requirements and for validating behavior models. However, scenarios raise coverage problems as they only capture partial histories of interaction among system component instances. Moreover, they often leave the actual requirements implicit. Numerous efforts have therefore been made recently to synthesize requirements or behavior models inductively from scenarios. Two problems arise from those efforts. On the one hand, the scenarios must be complemented with additional input such as state assertions along episodes or flowcharts on such episodes. This makes such techniques difficult to use by the nonexpert endusers who provide the scenarios. On the other hand, the generated state machines may be hard to understand as their nodes generally convey no domainspecific properties. Their validation by analysts, complementary to model checking and animation by tools, may therefore be quite difficult. This paper describes toolsupported techniques that overcome those two problems. Our tool generates a labeled transition system (LTS) for each system component from simple forms of message sequence charts (MSC) taken as examples or counterexamples of desired behavior. No additional input is required. A global LTS for the entire system is synthesized first. This LTS covers all scenario examples and excludes all counterexamples. It is inductively generated through an interactive procedure that extends known learning techniques for grammar induction. The procedure is incremental on training examples. It interactively produces additional scenarios that the enduser has to classify as examples or counterexamples of desired behavior. The LTS
synthesis procedure may thus also be used independently for requirements elicitation through scenario questions generated by the tool. The synthesized system LTS is then projected on local LTS for each system component. For model validation by analysts, the tool generates state invariants that decorate the nodes of the local LTS.
Learning Regular Languages From Simple Positive Examples
, 2000
"... Learning from positive data constitutes an important topic in Grammatical Inference since it is believed that the acquisition of grammar by children only needs syntactically correct (i.e. positive) instances. However, classical learning models provide no way to avoid the problem of overgeneralizati ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Learning from positive data constitutes an important topic in Grammatical Inference since it is believed that the acquisition of grammar by children only needs syntactically correct (i.e. positive) instances. However, classical learning models provide no way to avoid the problem of overgeneralization. In order to overcome this problem, we use here a learning model from simple examples, where the notion of simplicity is defined with the help of Kolmogorov complexity. We show that a general and natural heuristic which allows learning from simple positive examples can be developed in this model. Our main result is that the class of regular languages is probably exactly learnable from simple positive examples.
Grammar Inference, Automata Induction, and Language Acquisition
 Handbook of Natural Language Processing
, 2000
"... The natural language learning problem has attracted the attention of researchers for several decades. Computational and formal models of language acquisition have provided some preliminary, yet promising insights of how children learn the language of their community. Further, these formal models als ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The natural language learning problem has attracted the attention of researchers for several decades. Computational and formal models of language acquisition have provided some preliminary, yet promising insights of how children learn the language of their community. Further, these formal models also provide an operational framework for the numerous practical applications of language learning. We will survey some of the key results in formal language learning. In particular, we will discuss the prominent computational approaches for learning different classes of formal languages and discuss how these fit in the broad context of natural language learning.