Results 1 - 10
of
12
Discovering grammar rules for Automatic Extraction of Definitions
- In Doctoral Consortium at the Eurolan Summer School 2007, Iasi, Romania
, 2007
"... Automatic extraction of definitions from text documents can be very useful in various scenarios, especially in eLearning systems. In this paper, we propose an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Pr ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Automatic extraction of definitions from text documents can be very useful in various scenarios, especially in eLearning systems. In this paper, we propose an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Programming. By categorising definitions to enable the learning of more specialised grammars, we envisage to improve the performance of our learning programs. A genetic algorithm will be used to learn the relative importance of particular predefined features in definitions. To support this algorithm, we also propose a genetic program to evolve new features from existing ones. 1
Linguistic knowledge and question answering. Traitement Automatique des Langues (2006
"... ABSTRACT. The availability of robust and deep syntactic parsing can improve the performance of all modules of a Question Answering system. In this article, this is illustrated using examples from our QA system Joost, a Dutch QA system which has been used for both open and closed domain QA. The syste ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
ABSTRACT. The availability of robust and deep syntactic parsing can improve the performance of all modules of a Question Answering system. In this article, this is illustrated using examples from our QA system Joost, a Dutch QA system which has been used for both open and closed domain QA. The system can make use of information found in the fully parsed version of the document collections. We demonstrate that this improves the performance of various components of the system, such as answer extraction and selection, lexical acquisition, off-line relation extraction, and passage retrieval. RÉSUMÉ. Une analyse syntaxique profonde et robuste améliore la performance d’un système de question-réponse. Dans cet article, nous le montrerons en donnant des exemples de notre système QR, appelé Joost. C’est un système néerlandais, qui a été appliqué au domaine général ainsi qu’au domaine restreint. Le système utilise l’information contenue dans une version analysée syntaxiquement du corpus des documents. Nous montrerons que l’utilisation de l’information syntaxique améliore certains modules de Joost, comme l’extraction et l’ordonnancement final des réponses, l’acquisition automatique d’information lexicale, l’extraction de faits hors ligne et la recherche de passages.
Using Syntactic Knowledge for QA ⋆
"... Abstract. We describe the system of the University of Groningen for ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We describe the system of the University of Groningen for
On the evaluation of Polish definition extraction grammars
"... This paper presents the results of experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Polish. The extraction is performed by regular grammars over XMLencoded morphosyntactically-ann ..."
Abstract
- Add to MetaCart
This paper presents the results of experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Polish. The extraction is performed by regular grammars over XMLencoded morphosyntactically-annotated documents. The results, although perhaps still not fully satisfactory, are carefully evaluated and compared to the inter-annotator agreement; they clearly improve on previous definition extraction attempts for Polish. 1.
Who is Nelson Mandela? or What is MTV?).
"... The availability of robust and deep syntactic parsing can improve the performance of Question Answering systems. This is illustrated using examples from Joost, a Dutch QA system which has been used for both open (CLEF) and closed domain QA. 1 Linguistically Informed IR Information retrieval is used ..."
Abstract
- Add to MetaCart
The availability of robust and deep syntactic parsing can improve the performance of Question Answering systems. This is illustrated using examples from Joost, a Dutch QA system which has been used for both open (CLEF) and closed domain QA. 1 Linguistically Informed IR Information retrieval is used in most QA systems to filter out relevant passages from large document collections to narrow down the search for answer extraction modules in a QA system. Given a full syntactic analysis of the text collection, it becomes feasible to exploit linguistic information as a knowledge source for IR. Using Apache’s IR system Lucene, we can index the document collection along various linguistic dimensions, such as part of speech tags, named entity classes, and dependency relations. Tiedemann (2005) uses a genetic algorithm to optimize the use of such an extended IR index, and shows that it leads to significant improvements of IR performance. 2 Acquisition of Lexical Knowledge Syntactic similarity measures can be used for automatic acquisition of lexical knowledge required for QA, as well as for answer extraction and ranking. For instance, in van der Plas and Bouma (2005) it is shown that automatically acquired class-labels for named entities improve the accuracy of answering general WH-questions (i.e. Which ferry sank in the Baltic Sea?) and questions which ask for the definition of a named entity (i.e.
Learning to Mine Definitions from Slovene Structured and Unstructured Knowledge-Rich Resources
"... The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. First, a classification model was trained on examples from Slovene Wikipedia whi ..."
Abstract
- Add to MetaCart
The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. First, a classification model was trained on examples from Slovene Wikipedia which was then used to find well-formed definitions among the extracted candidates. The results of the experiment are encouraging, with accuracy ranging from 67 % to 71%. The paper also addresses some drawbacks of the approach and suggests ways to overcome them in future work. 1.
Automatic Grammar Rule Extraction and Ranking for Definitions
"... Learning texts contain much implicit knowledge which is ideally presented to the learner in a structured manner- a typical example being definitions of terms in the text, which would ideally be presented separately as a glossary for easy access. The problem is that manual extraction of such informat ..."
Abstract
- Add to MetaCart
Learning texts contain much implicit knowledge which is ideally presented to the learner in a structured manner- a typical example being definitions of terms in the text, which would ideally be presented separately as a glossary for easy access. The problem is that manual extraction of such information can be tedious and time consuming. In this paper we describe two experiments carried out to enable the automated extraction of definitions from non-technical learning texts using evolutionary algorithms. A genetic programming approach is used to learn grammatical rules helpful in discriminating between definitions and non-definitions, after which, a genetic algorithm is used to learn the relative importance of these features, thus enabling the ranking of candidate sentences in order of confidence. The results achieved are promising, and we show that it is possible for a Genetic Program to automatically learn similar rules derived by a human linguistic expert and for a Genetic Algorithm to then give a weighted score to those rules so as to rank extracted definitions in order of confidence in an effective manner. 1.
Definition Extraction from Text: an experiment using evolving algorithms
, 2008
"... Definition extraction can be useful for the creation of glossaries and in question answering systems. It is a tedious task to extract such sentences manually, and thus an automatic system is desirable. In this work we review various attempts at rule-based approaches reported in the literature and di ..."
Abstract
- Add to MetaCart
Definition extraction can be useful for the creation of glossaries and in question answering systems. It is a tedious task to extract such sentences manually, and thus an automatic system is desirable. In this work we review various attempts at rule-based approaches reported in the literature and discuss their results. We also propose a novel experiment involving the use of genetic programming and genetic algorithms, aimed at assisting the discovery of grammar rules which can be used for the task of definition extraction. 1
Definition Characterisation through Genetic Algorithms
"... The identification of definitions from natural language texts is useful in learning environments, for glossary creation and question answering systems. It is a tedious task to extract such definitions manually, and several techniques have been proposed for automatic definition identification in thes ..."
Abstract
- Add to MetaCart
The identification of definitions from natural language texts is useful in learning environments, for glossary creation and question answering systems. It is a tedious task to extract such definitions manually, and several techniques have been proposed for automatic definition identification in these domains, including rule-based and statistical methods. These techniques usually rely on linguistic expertise to identify grammatical and word patterns which characterize definitions. In this paper, we look at the use of machine learning techniques, in particular genetic algorithms, to enable the automatic extraction of definitions. Genetic algorithms are used to determine the relative importance of a set of linguistic features which can be present or absent in definitional sentences as a set of numerical weights. These weights provide an importance measure to the set of features. In this work we report on the results of various experiments carried out and evaluate them on an eLearning corpus. We also propose a way forward for discovering such features automatically through genetic programming and suggest how these two techniques can be used together for definition extraction.

