Results 1 - 10
of
16
Using cohesive properties of text for Automatic Summarization
- In JOTRI’02
, 2002
"... A system allowing extractive automatic summarization of textual documents is presented. The system is based on the cohesive properties of text, namely lexical chains, co-reference chains and named entity chains. In this way the system extend the well known lexicalchaining paradigm for summari ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A system allowing extractive automatic summarization of textual documents is presented. The system is based on the cohesive properties of text, namely lexical chains, co-reference chains and named entity chains. In this way the system extend the well known lexicalchaining paradigm for summarization. The system has been applied to summarization tasks on Spanish agency news. Results of its evaluation and comparison with a couple os baseline systems are presented.
An Integrated Statistical Model for Tagging and Chunking Unrestricted Text
, 2000
"... In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic nite-state automata. Therefore, it can include n-grams models or any stochastic nite-state automata learnt using grammatical inference techniques. As the models involved in our s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic nite-state automata. Therefore, it can include n-grams models or any stochastic nite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, it allows for a very exible and portable system for dierent languages and chunk denitions. In order to show the viability of our approach, we present results for tagging and chunking using dierent combinations of bigrams and other more complex automata learnt by means of the Error Correcting Grammatical Inference (ECGI) algorithm. The experimentation was carried out on the Wall Street Journal corpus for English and on the LexEsp corpus for Spanish.
Noun phrase translations for Cross-Language Document Selection
"... . This paper presents results for the CLEF interactive CrossLanguage Document Selection task at the UNED. Two translations techniques were compared: the standard Systran translations provided by CLEF organizers as baseline, and a phrase-based pseudo-translation approach that uses a phrase alignment ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. This paper presents results for the CLEF interactive CrossLanguage Document Selection task at the UNED. Two translations techniques were compared: the standard Systran translations provided by CLEF organizers as baseline, and a phrase-based pseudo-translation approach that uses a phrase alignment algorithm based on comparable corpora. The hypothesis being tested was that noun phrase translations could serve as summarized information for relevance judgment without compromising the precision of such judgments. In addition, we wanted to have an indirect measure of the quality of our phrase extraction process, that had been previously developed for an interactive CLIR application. The results of the experiment conrm that the hypothesis is reasonable: a set of 8 monolingual Spanish speakers judged English documents with the same precision for both systems, but achieved 52% more recall using phrasal translations than using full Systran translations. 1
Integrating cohesion and coherence for automatic summarization
- In the Proceedings of the 11th Meeting of the European Chapter of the Association for Computational Linguistics (EACL-03
, 2003
"... This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When ev ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting summaries and cannot beat a dummy baseline consisting of the first sentence in the document. Nevertheless, we argue that this approach relies on basic linguistic mechanisms and is therefore genreindependent. 1
APOLN: A Partial Parser Of Unrestricted Text
- In: Proceedings of 5th Conference on Computational Lexicography and Text Research COMPLEX-99, Pecs
, 1999
"... In this paper, we present APOLN (Analizador Parcial de Oraciones en Lenguaje Natural): a partial parser of unrestricted natural language sentences based on finite -state techniques. Partial parsing has been used in several applications: syntactic parsing of unrestricted texts, data extraction sys ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we present APOLN (Analizador Parcial de Oraciones en Lenguaje Natural): a partial parser of unrestricted natural language sentences based on finite -state techniques. Partial parsing has been used in several applications: syntactic parsing of unrestricted texts, data extraction systems, machine translation, solving the attachment ambiguity, speech recognition systems, text summarization, etc. The main attractiveness of partial parsing is that is able to handle unrestricted sentences, that contain lexical errors or that present constructions not accepted by the defined grammar. Partial parsing is an alternative to the definition of wide coverage grammars whose definition is an expensive and complex task and that present well-known problems such as overgeneration, undergeneration and ambiguity. We present APOLN as a tool that can be used to construct syntactically annotated corpora from lexically tagged corpora. We also present the results (precision and recall rates) of applying APOLN on unrestricted Spanish corpora, and how tagging errors influence the performance of the parser. 2 1
Stemming in Spanish: A first approach to its impact on information retrieval
- In [17
, 2002
"... Most models and techniques employed in Information Retireval at some time or other use frecuency counts of the terms appearing in both documents and queries. Many words that derive from the same stem have a close semantic content. Locating stems common to several words and grouping them by replacing ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Most models and techniques employed in Information Retireval at some time or other use frecuency counts of the terms appearing in both documents and queries. Many words that derive from the same stem have a close semantic content. Locating stems common to several words and grouping them by replacing them with the corresponding stem can improve the working of these systems. Stemming procedures differ, however, depending on the different languages. We describe a stemmer for Spanish and the tests carried out by applying it to Information Retrieval. 1
Representation and treatment of multiword expressions in basque
- In Proceedings of the ACL workshop on Multiword Expressions
, 2004
"... This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a generalpurpose lexical database. Due to its expressive power, the schema can deal not only with fixed expressions but also with morphosyntactically flexible constructions. It also allows us to lemmatize word combinations as a unit and yet to parse the components individually if necessary. Moreover, we describe HABIL, a tool for the automatic processing of these expressions, and we give some evaluation results. This work must be placed in a general framework of written Basque processing tools, which currently ranges from the tokenization and segmentation of single words up to the syntactic tagging of general texts. 1
Design Principles for a Spanish Treebank
- in Proceedings of The Workshop on Treebanks and Linguistic Theories (TLT2002
, 2002
"... Treebanks are widely recognised as a necessary source of information in NLP as well as in Linguistics studies. In this paper we present and justify methodological principles and syntactic criteria to build a Treebank for Spanish: annotating only explicit information, constituents and syntactic fu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Treebanks are widely recognised as a necessary source of information in NLP as well as in Linguistics studies. In this paper we present and justify methodological principles and syntactic criteria to build a Treebank for Spanish: annotating only explicit information, constituents and syntactic functions and being theory independent. Previous work is also presented in order to account for taken decisions. The annotation process will be done in different steps so that each one of them is the input of the next. We present the basic guidelines of syntactic annotation and the boundaries of the work to be done in a first step: annotation of low constituents and surface functions. Moreover, some semantic information (subject type) is likely to be included.
Incremental Partial Parser Of Unrestricted Natural Language Sentences
- In SNRFAI99
, 1999
"... One of the current focuses of research within natural language processing is the partial and robust parsing of sentences written in natural language. Partial parsing could be used in diverse applications as data extraction, machine translation, dialogue systems, etc. His main attractiveness is th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
One of the current focuses of research within natural language processing is the partial and robust parsing of sentences written in natural language. Partial parsing could be used in diverse applications as data extraction, machine translation, dialogue systems, etc. His main attractiveness is that is able to handle unrestricted sentences, that contain lexical errors or that present constructions not accepted by the defined grammar. Partial parsing is an alternative to the definition of wide coverage grammars whose definition is an expensive and complex task and that present well-known problems such as overgeneration, undergeneration and ambiguity. In this paper, we present a partial parser of unrestricted natural language sentences APOLN (Analizador Parcial de Oraciones en Lenguaje Natural) which is based on finite-state machines. APOLN is an incremental parser that permits the compiling and inheritance of feature structures between levels of processing. We present the results of applying APOLN on an unrestricted Spanish corpus and we will use it in a speech dialogue system. 1
Semantic Parsing with Verbal Subcategorization
"... This paper has presented a semantic parsing approach for non domainspecific texts. Our approach is based on the application of a verbal subcategorization lexicon (LEXPIR) developed in the Pirapides project. ..."
Abstract
- Add to MetaCart
This paper has presented a semantic parsing approach for non domainspecific texts. Our approach is based on the application of a verbal subcategorization lexicon (LEXPIR) developed in the Pirapides project.

