## Grammar Inference, Automata Induction, and Language Acquisition (2000)

Venue: | Handbook of Natural Language Processing |

Citations: | 21 - 1 self |

### BibTeX

@INPROCEEDINGS{Parekh00grammarinference,,

author = {Rajesh Parekh and Vasant Honavar},

title = {Grammar Inference, Automata Induction, and Language Acquisition},

booktitle = {Handbook of Natural Language Processing},

year = {2000},

pages = {727--764},

publisher = {Marcel Dekker}

}

### OpenURL

### Abstract

The natural language learning problem has attracted the attention of researchers for several decades. Computational and formal models of language acquisition have provided some preliminary, yet promising insights of how children learn the language of their community. Further, these formal models also provide an operational framework for the numerous practical applications of language learning. We will survey some of the key results in formal language learning. In particular, we will discuss the prominent computational approaches for learning different classes of formal languages and discuss how these fit in the broad context of natural language learning.

### Citations

3854 |
Introduction to Automata Theory, Languages and Computation
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...of language acquisition. Research in language acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics [15, 16, 75], theoretical computer science =-=[42, 55]-=-, computational learning theory [45, 63], artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the dev... |

3749 |
Artificial Intelligence: A Modern Approach
- Russell, Norvig
- 2002
(Show Context)
Citation Context ...om advances in several disciplines includingscognitive psychology [9], linguistics [15, 16, 75], theoretical computer science [42, 55], computational learning theory [45, 63], artificial intelligence =-=[84]-=-, machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the development of the children's cognitive faculties, the nature of stimuli ... |

1706 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ... An interesting question would be to see whether DFA are approximately learnable. Valiant's distribution-independent model of learning also called the probably approximately correct model (PAC model) =-=[93]-=- is a widely used model for approximate learning. When adapted to the problem of learning DFA, the goal of a PAC learning algorithm is to obtain from a randomly drawn set of labeled examples, in polyn... |

1694 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitanyi
- 1993
(Show Context)
Citation Context ...dditive constant. Thus, the Kolmogorov complexity of an object with respect to the Universal Turing machine is treated as the Kolmogorov complexity of the object. The interested reader is referred to =-=[54]-=- for a complete treatment of Kolmogorov complexity and related topics. The Solomonoff-Levin universal distribution m assigns high probability to objects that are simple i.e., m(x)s2 \GammaK(x) . If we... |

1552 | Finding structure in time
- Elman
- 1990
(Show Context)
Citation Context ... recurrent neural networks (RNN) [62]. A variety of RNN architectures have been investigated for learning grammars from a set of positive and negative examples of the target language (see for example =-=[17, 18, 25, 26, 31, 34, 32, 58, 78, 85, 97]-=-). In the following section we describe the second order recurrent neural network architecture due to Giles et al [31]. 3.6.1 Second Order RNN for Regular Grammar Inference Recurrent neural networks h... |

894 |
Language identification in the limit
- Gold
- 1967
(Show Context)
Citation Context ...ejects all negative examples. 3.2 Results in Regular Grammar Inference Regular grammar inference is a hard problem in that regular grammars cannot be correctly identified from positive examples alone =-=[36]-=-. Further, it has been shown that there exists no efficient learning algorithm for identifying the minimumstate DFA that is consistent with an arbitrary set of positive and negative examples [37]. Eff... |

839 | An introduction to hidden Markov models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...tic finite state automata where both the state transitions and the output symbols are governed by probability distributions. HMMs have been applied successfully in speech recognition and cryptography =-=[81]-=-. Formally, a HMM comprises of the following elements: ffl A finite set of states Q (labeled q 1 , q 2 , : : :, q N ) ffl A finite alphabet \Sigma (of symbols labeled oe 1 , oe 2 , . . . oe M ) ffl Th... |

787 |
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
- Baum, Petrie, et al.
- 1970
(Show Context)
Citation Context ...he model parameters that would maximize the probability of a single observation sequence. In order to determine the model parameters an iterative procedure called the Baum-Welch algorithm can be used =-=[7]-=-. Stolcke and Omohundro present a more general approach to the HMM learning problem [88]. Their approach is a Bayesian model merging strategy which facilitates learning the HMM structure as well as th... |

596 |
An Introduction to Computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...guage acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics [15, 16, 75], theoretical computer science [42, 55], computational learning theory =-=[45, 63]-=-, artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the development of the children's cognitive fac... |

554 |
Generalization as search
- Mitchell
- 1982
(Show Context)
Citation Context ...tart and a knowledgeable teacher is available to answer the membership queries posed by the learner. The lattice of finite state automata defined above is implicitly represented using a version space =-=[60]-=-. The version space contains all the elements of the lattice that are consistent with the labeled examples and the membership queries. It is represented using two sets of finite state automata: a set ... |

519 |
The language instinct
- Pinker
- 1994
(Show Context)
Citation Context ...anguage on the basis of a finite number sentences that they encounter during their formative years. Consequently, natural language acquisition has been, and continues to be, a major focus of research =-=[48, 75]-=-. The past decade has seen significant theoretical as well as experimental advances in the study of natural language acquisition. Examples of some current developments include bootstrapping hypotheses... |

512 |
Learning regular sets from queries and counterexamples
- Angluin
- 1987
(Show Context)
Citation Context ...and equivalence queries of the form "Is this DFA equivalent to the target?" Using labeled examples together with membership and equivalence queries it is possible to correctly identify the t=-=arget DFA [3]-=-. 6 3.3 Search Space Regular grammar inference can be formulated as a search problem in the space of all finite state automata (FSA). Clearly, the space of all FSA is infinite. One way to restrict the... |

470 |
Finite-state languages
- Chomsky, Miller
- 1958
(Show Context)
Citation Context ...ibly be answered by studying formal models of language acquisition. Research in language acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics =-=[15, 16, 75]-=-, theoretical computer science [42, 55], computational learning theory [45, 63], artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of lan... |

382 | Optimality theory
- Prince, Smolensky
- 1993
(Show Context)
Citation Context ...well as experimental advances in the study of natural language acquisition. Examples of some current developments include bootstrapping hypotheses and constraint-based theoriess[9], optimality theory =-=[80, 89]-=-, and neural theory of language [27, 28]. Empirical evidence from these results argues in favor of language learnability to address some of the key problems encountered in child language acquisition. ... |

376 |
The estimation of stochastic context-free grammars using the insideoutside algorithm. Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...antages of SCFG lie in their ability to capture the embedded structure within the speech data and their superior predictive power in comparison with regular grammars as measured by prediction entropy =-=[50]-=-. The Inside-Outside algorithm can be used to estimate the free parameters of a SCFG. Given a set of positive training sentences and a SCFG whose parameters are randomly initialized, the inside-outsid... |

333 |
Three models for the description of languages
- Chomsky
- 1956
(Show Context)
Citation Context ...or equivalently languages) which is guaranteed to contain the unknown grammar. The classes of grammars such as regular or context free grammars that belong to the Chomsky hierarchy of formal grammars =-=[13, 42]-=- are often used to model the target grammar. The methods for grammar inference typically identify an unknown grammar (or an approximation of it) from a set of candidate hypotheses (e.g., the set of re... |

314 | Distributed representations, simple recurrent networks, and grammatical structure
- Elman
- 1991
(Show Context)
Citation Context ... recurrent neural networks (RNN) [62]. A variety of RNN architectures have been investigated for learning grammars from a set of positive and negative examples of the target language (see for example =-=[17, 18, 25, 26, 31, 34, 32, 58, 78, 85, 97]-=-). In the following section we describe the second order recurrent neural network architecture due to Giles et al [31]. 3.6.1 Second Order RNN for Regular Grammar Inference Recurrent neural networks h... |

306 | Cryptographic limitations on learning boolean formulae and finite automata
- Kearns, Valiant
- 1989
(Show Context)
Citation Context ...er of membership queries [3]. However, PAC learning of DFA is proven to be a hard problem in that there exists no polynomial time algorithm that efficiently PAC learns DFA from labeled examples alone =-=[44, 77]-=-. The PAC model's requirement of learnability under all conceivable probability distributions is often considered too stringent in practice. Pitt identified the following open research problem: Are DF... |

223 |
Complexity of automaton identification from given data
- Gold
- 1978
(Show Context)
Citation Context ...lone [36]. Further, it has been shown that there exists no efficient learning algorithm for identifying the minimumstate DFA that is consistent with an arbitrary set of positive and negative examples =-=[37]-=-. Efficient algorithms for identification of DFA assume that additional information is provided to the learner. This information is typically in the form of a set of examples S that satisfies certain ... |

215 | An evolutionary algorithm that constructs recurrent neural networks
- Angeline, Saunders, et al.
- 1994
(Show Context)
Citation Context ... network from efficiently learning the target grammar. Angeline et al proposed an evolutionary network induction algorithm for simultaneously acquiring both the network topology and the weight values =-=[2]-=-. Their algorithm called GNARL (GeNeralized Acquisition of Recurrent Links) uses evolutionary programming techniques which reply on mutation as the sole genetic reproduction operator (as against genet... |

209 | The induction of dynamical recognizers
- Pollack
- 1991
(Show Context)
Citation Context ... recurrent neural networks (RNN) [62]. A variety of RNN architectures have been investigated for learning grammars from a set of positive and negative examples of the target language (see for example =-=[17, 18, 25, 26, 31, 34, 32, 58, 78, 85, 97]-=-). In the following section we describe the second order recurrent neural network architecture due to Giles et al [31]. 3.6.1 Second Order RNN for Regular Grammar Inference Recurrent neural networks h... |

204 |
Syntactic Pattern Recognition and Applications
- Fu
- 1982
(Show Context)
Citation Context ...achine learning, machine perception, robotics, planning, knowledge representation, and reasoning. Formal language models are extensively used in syntactic or linguistic pattern classification systems =-=[29, 30]-=-. The structural inter-relationships among the linguistic pattern attributes are easily captured by representing the patterns as strings (a collection of syntactic symbols). Learning a set of rules fo... |

187 | The need for biases in learning generalizations
- Mitchell
- 1980
(Show Context)
Citation Context ...rning and computational learning theory appear to lend credence to Chomsky's view. Almost all learning algorithms either explicitly or implicitly use appropriate representational and inductive biases =-=[59, 61]-=-. The representational bias of the algorithm determines the space of candidate hypotheses (or sets of grammars in the case of language acquisition) that are considered by the learner. In order for lea... |

172 |
Learning and extracting finite state automata with second-order recurrent neural networks
- Giles, Miller, et al.
- 1992
(Show Context)
Citation Context |

171 |
Adaptation in Natural and Arti cial Systems. The
- Holland
- 1975
(Show Context)
Citation Context ... exponential in the size of the PTA. Figure 6 comes here (Bi-directional search) 3.4.2 Randomized Search Genetic algorithms o er an attractive framework for randomized search inlargehypotheses spaces =-=[40]-=-. A typical genetic search involves evolving a randomly generated set of individuals (from the hypothesis space) based on the survival of the ttest principle of Darwinian evolution. A population of ra... |

169 | Inference of finite automata using homing sequences
- Rivest, Schapire
- 1993
(Show Context)
Citation Context ...has to treat its current state as the start state and explore the environment further. Rivest and Schapire have proposed a method based on homing sequences to learn the target DFA in these situations =-=[83]-=-. If it is assumed that each state of the DFA has an output (the output could simply be 1 for an accepting state and 0 for a non-accepting state) then a homing sequence is defined as a sentence whose ... |

150 |
Concepts and Consequences of the Theory of Government and Binding
- Chomsky
- 1982
(Show Context)
Citation Context ...ibly be answered by studying formal models of language acquisition. Research in language acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics =-=[15, 16, 75]-=-, theoretical computer science [42, 55], computational learning theory [45, 63], artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of lan... |

150 |
Finite State Automata and Simple Recurrent Networks
- Cleeremans, Servan-Schreiber, et al.
- 1989
(Show Context)
Citation Context |

137 | Approaches to the automatic discovery of patterns in biosequences
- Brazma, Jonassen, et al.
- 1995
(Show Context)
Citation Context ...ral practical uses including speech recognition, discovery of patterns in biosequences, image segmentation, interpretation of ECG, handwriting recognition, recognition of seismic signalssand the like =-=[8, 30, 57]-=-. The issues and practical difficulties associated with formal language learning models can provide useful insights for the development of language understanding systems. Several key questions in natu... |

137 | Learning stochastic regular grammars by means of a state merging method
- Carrasco, Oncina
- 1994
(Show Context)
Citation Context ... comes here (deterministic SFA) 4.1.1 The Alergia Algorithm for Learning SFA Carrasco and Oncina have developed an algorithm for the inference of deterministic stochastic finite state automata (DSFA) =-=[12]-=-. A DSFA is a SFA where for each state q i 2 Q and symbol a 2 \Sigma there exists at most one state q j such that p ij (a) 6= 0. This 20 algorithm, called Alergia, is based on a state merging approach... |

130 | Inducing probabilistic grammars by Bayesian model merging
- Stolcke, Omohundro
- 1994
(Show Context)
Citation Context ...e. In order to determine the model parameters an iterative procedure called the Baum-Welch algorithm can be used [7]. Stolcke and Omohundro present a more general approach to the HMM learning problem =-=[88]-=-. Their approach is a Bayesian model merging strategy which facilitates learning the HMM structure as well as the model parameters from a given set of positive examples. The first step constructs an i... |

120 |
Machine Learning: A Theoretical Approach
- Natarajan
- 1992
(Show Context)
Citation Context ...guage acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics [15, 16, 75], theoretical computer science [42, 55], computational learning theory =-=[45, 63]-=-, artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the development of the children's cognitive fac... |

119 |
Approaches to Machine Learning
- Langley, Carbonell
- 1984
(Show Context)
Citation Context ...anguage on the basis of a finite number sentences that they encounter during their formative years. Consequently, natural language acquisition has been, and continues to be, a major focus of research =-=[48, 75]-=-. The past decade has seen significant theoretical as well as experimental advances in the study of natural language acquisition. Examples of some current developments include bootstrapping hypotheses... |

103 |
Inferring regular languages in polynomial update time
- Oncina, Garcia
- 1992
(Show Context)
Citation Context ...h is a representative set that includes information about the states and transitions of the target DFA. They showed that it is possible to exactly identify the target DFA from a characteristic sample =-=[65]-=-. Angluin has described the use of a minimally adequate teacher to guide the learner in the identification of the target DFA. A minimally adequate teacher is capable of answering membership queries of... |

90 | Results of the Abbadingo One DFA learning competition and a new evidence-driven state merging algorithm
- Lang, Pearlmutter, et al.
- 1998
(Show Context)
Citation Context ...her, as demonstrated by the results of the recent Abbadingo One DFA Learning Competition several efficient heuristic approaches provide satisfactory solutions to the regular grammar inference problem =-=[39, 43, 46]-=-. Context-free and context-sensitive grammars generate languages that are clearly more expressive than those generated by regular grammars. This has motivated a large body of research on algorithms fo... |

90 | Graded State Machines: The representation of temporal contingencies in simple recurrent networks
- Servan-Schreiber, Cleeremans, et al.
- 1991
(Show Context)
Citation Context |

84 | Learning models of intelligent agents
- Carmel, Markovitch
- 1996
(Show Context)
Citation Context ...e target DFA in the limit is of interest. Particularly, in the case of intelligent autonomous agents, incremental learning offers an attractive framework for characterizing the behavior of the agents =-=[11]-=-. 16 Parekh et al have proposed an efficient incremental algorithm for learning regular grammars using membership queries [72]. Their method extends Angluin's ID algorithm to an incremental framework.... |

84 | Optimality: from neural networks to universal grammar
- Prince, Smolensky
- 1997
(Show Context)
Citation Context ...well as experimental advances in the study of natural language acquisition. Examples of some current developments include bootstrapping hypotheses and constraint-based theoriess[9], optimality theory =-=[80, 89]-=-, and neural theory of language [27, 28]. Empirical evidence from these results argues in favor of language learnability to address some of the key problems encountered in child language acquisition. ... |

81 |
Finite Automata: Behavior and Synthesis
- Trakhtenbrot, Barzdin
- 1973
(Show Context)
Citation Context ...mplete labeled sample i.e., a sample that includes all sentences up to a particular length with the corresponding label that indicates whether the sentence is a positive example or a negative example =-=[92]-=-. Oncina and Garcia have defined a characteristic set of examples which is a representative set that includes information about the states and transitions of the target DFA. They showed that it is pos... |

78 | Inductive Inference, DFAs, and Computational Complexity
- Pitt
- 1889
(Show Context)
Citation Context ...g DFA, the goal of a PAC learning algorithm is to obtain from a randomly drawn set of labeled examples, in polynomial time, with high probability, a DFA that is a good approximation of the target DFA =-=[76]-=-. A good approximation of the target is defined as one for which the probability of error on an unseen example is less than a pre-specified error parameter ffl. Since on a given run, the randomly draw... |

76 |
Formal models of language learning
- Pinker
- 1979
(Show Context)
Citation Context ...rs are recognized by pushdown automata which are simply finite state automata augmented with a pushdown stack. Context free grammars are adequate for several practical natural language modeling tasks =-=[19, 73]-=-. Context sensitive grammars represent the next level of grammars in the hierarchy and unrestricted grammars (which place no restriction on the form of the production rules) complete the formal langua... |

71 |
Arti cial Intelligence: A Modern Approach
- Russell, Norvig
- 1995
(Show Context)
Citation Context ...rom advances in several disciplines including cognitive psychology [9], linguistics [15, 16, 75], theoretical computer science [42, 55], computational learning theory [45, 63], arti cial intelligence =-=[84]-=-, machine learning [48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the development ofthechildren's cognitive faculties, the nature of stimuli av... |

63 |
The Human Semantic Potential
- Regier
- 1996
(Show Context)
Citation Context ...development of formal models of how children learn language [27, 28]. Regier proposed a computational model of how some lexical items describing spatial relations might develop in different languages =-=[82]-=-. His system includes a simple model of the visual system which is common to all human beings and thus must be the source from which all visual concepts arise. Using conventional backpropagation techn... |

62 | When Push Comes to Shove: A Computational Model of the Role of Motor Control in the Acquisition of Action Verbs - Bailey - 1997 |

61 | Modeling embodied lexical development - Feldman, Narayanan, et al. - 1997 |

61 | Extraction of rules from discretetime recurrent neural networks
- Omlin, Giles
- 1996
(Show Context)
Citation Context ...ist methods is that the learned models are not as transparent as is the case with symbolic approaches. Several researchers have studied methods for extracting finite state automata from a trained RNN =-=[20, 33, 64, 97]-=-. It is observed that RNN develop an internal state representation in the form of clusters in the activation space of the recurrent state neurons. Therefore, a symbolic description of the learned fini... |

58 |
Introduction to Languages and the Theory of Computation
- Martin
- 1991
(Show Context)
Citation Context ...of language acquisition. Research in language acquisition has benefited from advances in several disciplines includingscognitive psychology [9], linguistics [15, 16, 75], theoretical computer science =-=[42, 55]-=-, computational learning theory [45, 63], artificial intelligence [84], machine learnings[48, 61], and pattern recognition [30, 57]. Psychological studies of language acquisition have explored the dev... |

56 | Learning simple concepts under simple distributions
- Li, Vitanyi
- 1991
(Show Context)
Citation Context ...h as simple k-reversible DFA and log n-term DNF (i.e., boolean formulas in disjunctive normal form) are PAC learnable under this model while their learnability under the standard PAC model is unknown =-=[53]-=-. Recently, Denis et al proposed a variant of the simple PAC learning model where a teacher might intelligently select simple examples based on his/her knowledge of the target. Under this model (calle... |

53 |
Life-like computer characters: the persona project at microsoft research
- Ball, Ling, et al.
- 1997
(Show Context)
Citation Context ...guage learning stems also from the numerous practical applications of language learning by machines. Research in instructible robots [19] and intelligent software agents and conversational interfaces =-=[6]-=- is geared towards the design of agents that can understand and execute verbal instructions given in a natural language (such as English) or some restricted subset of a natural language. For example, ... |

45 | What is the Search Space of the Regular Inference
- Dupont, Miclet, et al.
- 1994
(Show Context)
Citation Context ...hat if the set of positive examples 7 provided to the learner is a structurally complete set then the lattice constructed above is guaranteed to contain the minimum state DFA equivalent to the target =-=[24, 67, 69]-=-. A structurally complete set for the target DFA is a set of positive examples (S + ) such that for each transition of the target DFA there is at least one string in S + that covers the transition and... |