Results 1 - 10
of
42
LEARNING DETERMINISTIC REGULAR GRAMMARS FROM STOCHASTIC SAMPLES IN POLYNOMIAL TIME
, 1999
"... In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly wi ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.
Information Extraction in Structured Documents using Tree Automata Induction
, 2002
"... Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such docu ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.
The Acquisition of a Unification-Based Generalised Categorial Grammar
, 2002
"... The purpose of this work is to investigate the process of grammatical acquisition from data. In order to do that, a computational learning system is used, composed of a Universal Grammar with associated parameters, and a learning algorithm, following the Principles and Parameters Theory. The Univers ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The purpose of this work is to investigate the process of grammatical acquisition from data. In order to do that, a computational learning system is used, composed of a Universal Grammar with associated parameters, and a learning algorithm, following the Principles and Parameters Theory. The Universal Grammar is implemented as a Unification-Based Generalised Categorial Grammar, embedded in a default inheritance network of lexical types. The learning algorithm receives input from a corpus of spontaneous child-directed transcribed speech annotated with logical forms and sets the parameters based on this input. This framework is used as a basis to investigate several aspects of language acquisition. In this thesis I concentrate on the acquisition of subcategorisation frames and word order information, from data. The data to which the learner is exposed can be noisy and ambiguous, and I investigate how these factors a#ect the learning process. The results obtained show a robust learner converging towards the target grammar given the input data available. They also show how the amount of noise present in the input data a#ects the speed of convergence of the learner towards the target grammar. Future work is suggested for investigating the developmental stages of language acquisition as predicted by the learning model, with a thorough comparison with the developmental stages of a child. This is primarily a cognitive computational model of language learning that can be used to investigate and gain a better understanding of human language acquisition, and can potentially be relevant to the development of more adaptive NLP technology.
Learning Regular Languages From Simple Positive Examples
, 2000
"... Learning from positive data constitutes an important topic in Grammatical Inference since it is believed that the acquisition of grammar by children only needs syntactically correct (i.e. positive) instances. However, classical learning models provide no way to avoid the problem of over-generalizati ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Learning from positive data constitutes an important topic in Grammatical Inference since it is believed that the acquisition of grammar by children only needs syntactically correct (i.e. positive) instances. However, classical learning models provide no way to avoid the problem of over-generalization. In order to overcome this problem, we use here a learning model from simple examples, where the notion of simplicity is defined with the help of Kolmogorov complexity. We show that a general and natural heuristic which allows learning from simple positive examples can be developed in this model. Our main result is that the class of regular languages is probably exactly learnable from simple positive examples.
Using grammatical inference to automate information extraction from the web
- In Principles of Data Mining and Knowledge Discovery
, 2001
"... Abstract. The World-Wide Web contains a wealth of semistructured information sources that often give partial/overlapping views on the same domains, such as real estate listings or book prices. These partial sources could be used more effectively if integrated into a single view; however, since they ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. The World-Wide Web contains a wealth of semistructured information sources that often give partial/overlapping views on the same domains, such as real estate listings or book prices. These partial sources could be used more effectively if integrated into a single view; however, since they are typically formatted in diverse ways for human viewing, extracting their data for integration is a difficult challenge. Existing learning systems for this task generally use hardcoded ad hoc heuristics, are restricted in the domains and structures they can recognize, and/or require manual training. We describe a principled method for automatically generating extraction wrappers using grammatical inference that can recognize general structures and does not rely on manually-labelled examples. Domain-specific knowledge is explicitly separated out in the form of declarative rules. The method is demonstrated in a test setting by extracting real estate listings from web pages and integrating them into an interactive data visualization tool based on dynamic queries. 1
Inductive Inference with Procrastination: Back to Definitions
- Fundamenta Informaticae
, 1999
"... In this paper, we reconsider the denition of procrastinating learning machines. In the original denition of Freivalds and Smith [FS93], constructive ordinals are used to bound mindchanges. We investigate possibility of using arbitrary linearly ordered sets to bound mindchanges in similar way. It ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper, we reconsider the denition of procrastinating learning machines. In the original denition of Freivalds and Smith [FS93], constructive ordinals are used to bound mindchanges. We investigate possibility of using arbitrary linearly ordered sets to bound mindchanges in similar way. It turns out that using certain ordered sets it is possible to dene inductive inference types dierent from the previously known ones. We investigate properties of the new inductive inference types and compare them to other types. This research was supported by Latvian Science Council Grant No.93.599 and NSF Grant 9421640. Some of the results from this paper were presented earlier [AFS96]. y The third author was supported in part by NSF Grant 9301339. 1 Introduction We study inductive inference using the model developed by Gold [Gol67]. There is a well known hierarchy of larger and larger classes of learnable sets of phenomena based on the number of time a learning machine is all...
Learning XML Grammars
- IN MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION MLDM'01, NUMBER 2123 IN LNCS
, 2001
"... We sketch possible applications of grammatical inference techniques to problems arising in the context of XML. The idea is to infer document type defnitions (DTDs) of XML documents in situations when either the original DTD is missing or should be (re)designed or should be restricted to a more user- ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We sketch possible applications of grammatical inference techniques to problems arising in the context of XML. The idea is to infer document type defnitions (DTDs) of XML documents in situations when either the original DTD is missing or should be (re)designed or should be restricted to a more user-oriented view on a subset of the (given) DTD. The use- fulness of such an approach is underlined by the importance of knowing appropriate DTDs; this knowledge can be exploited, e.g., for optimizing database queries based on XML.
Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE
, 2001
"... In this paper we set out to compare ..."
Unsupervised Grammar Inference Systems for Natural Language
, 2002
"... In recent years there have been significant advances in the field of Unsupervised Grammar Inference (UGI) for Natural Languages such as English or Dutch. This paper presents a broad range of UGI implementations, where we can begin to see how the theory has been put in to practise. Several mature sys ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In recent years there have been significant advances in the field of Unsupervised Grammar Inference (UGI) for Natural Languages such as English or Dutch. This paper presents a broad range of UGI implementations, where we can begin to see how the theory has been put in to practise. Several mature systems are emerging, built using complex models and capable of deriving natural language grammatical phenomena. The range of systems is classified into: models based on Categorial Grammar (GraSp, CLL, EMILE); Memory Based Learning models (FAMBL, RISE); Evolutionary computing models (ILM, LAgts); and string-pattern searches (ABL, GB). An objectively measurable statistical comparison of performance Of the systems reviewed is not yet feasible. However, their merits and shortfalls are discussed, as well as a look at what the future has in store for UGI.

