Results 1  10
of
13
Information Extraction in Structured Documents using Tree Automata Induction
, 2002
"... Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such docu ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.
Information extraction from web documents based on local unranked tree automaton inference
 In (IJCAI2003
, 2003
"... Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semistructured documents (in XML or HTML) uses learning techniques based on strings. Some recent work converts the document to a ranked tree and uses tree automaton ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semistructured documents (in XML or HTML) uses learning techniques based on strings. Some recent work converts the document to a ranked tree and uses tree automaton induction. This paper introduces an algorithm that uses unranked trees to induce an automaton. Experiments show that this gives the best results obtained so far for IE from semistructured documents based on learning. 1
Learning Rational Stochastic Tree Languages
"... Abstract. We consider the problem of learning stochastic tree languages, i.e. probability distributions over a set of trees T(F), from a sample of trees independently drawn according to an unknown target P. We consider the case where the target is a rational stochastic tree language, i.e. it can be ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. We consider the problem of learning stochastic tree languages, i.e. probability distributions over a set of trees T(F), from a sample of trees independently drawn according to an unknown target P. We consider the case where the target is a rational stochastic tree language, i.e. it can be computed by a rational tree series or, equivalently, by a multiplicity tree automaton. In this paper, we provide two contributions. First, we show that rational tree series admit a canonical representation with parameters that can be efficiently estimated from samples. Then, we give an inference algorithm that identifies the class of rational stochastic tree languages in the limit with probability one. 1
Learning multiplicity tree automata
 In: Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI’06). Volume 4201 of LNCS
, 2006
"... Abstract. In this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define fu ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. In this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define functions over any field K. A multiplicity automaton admits a support which is a non deterministic automaton. From a grammatical inference point of view, this paper presents a contribution which is original due to the combination of two important aspects. This is the first time, as far as we now, that a learning method focuses on non deterministic tree automata which computes functions over a field. The algorithm proposed in this paper stands in Angluin’s exact model where a learner is allowed to use membership and equivalence queries. We show that this algorithm is polynomial in time in function of the size of the representation.
Generalized Stochastic Tree Automata for MultiRelational Data Mining
, 2002
"... This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, u ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a wellknown grammatical inference algorithm.
Information Extraction from Structured Documents using ktestable Tree Automaton Inference
"... Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from struc tured documents, such as HTML or XML, uses learning techniques that are based on strings, such as finite automata induction. This paper e ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from struc tured documents, such as HTML or XML, uses learning techniques that are based on strings, such as finite automata induction. This paper explores methods that exploit the tree structure of the documents. In particular, our method infers a k testable tree automaton from a small set of annotated examples and explores various ways to generalize the inferred automaton. Experimental results on the benchmark data sets show that our approach compares favorably to the previous approaches.
A Comparison of PCFG Models
, 2000
"... In this paper, we compare three different approaches to build a probabilistic contextfree grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores inf ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper, we compare three different approaches to build a probabilistic contextfree grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3) a model that estimates the probabilities according to a generalized kgram scheme with k = 3. The last one allows for a faster parsing and decreases the perplexity of test samples.
Learning (k,l)contextual tree languages for information extraction from web pages
, 2008
"... This paper introduces a novel method for learning a wrapper for extraction of information from web pages, based upon (k, l)contextual tree languages. It also introduces a method to learn good values of k and l based on a few positive and negative examples. Finally, it describes how the algorithm c ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper introduces a novel method for learning a wrapper for extraction of information from web pages, based upon (k, l)contextual tree languages. It also introduces a method to learn good values of k and l based on a few positive and negative examples. Finally, it describes how the algorithm can be integrated in a tool for information extraction.
Tree kGrammar Models for Natural Language Modelling and Parsing
"... Abstract. In this paper, we compare three different approaches to build a probabilistic contextfree grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that al ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. In this paper, we compare three different approaches to build a probabilistic contextfree grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node’s category, and (3) a model that estimates the probabilities according to a generalized kgram scheme for trees with k = 3. The last model allows for faster parsing and decreases considerably the perplexity of test samples. 1