Results 1 - 10
of
19
Inferring strategies for sentence ordering in multidocument news summarization
- Journal of Artificial Intelligence Research
, 2002
"... The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the c ..."
Abstract
-
Cited by 72 (7 self)
- Add to MetaCart
The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies. 1.
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
, 2004
"... We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear.
2006. Joint extraction of entities and relations for opinion recognition
- In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP
, 2006
"... We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions — along with the linking relation that exists between them. Inspired by Ro ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We present an approach for the joint extraction of entities and relations in the context of opinion recognition and analysis. We identify two types of opinion-related entities — expressions of opinions and sources of opinions — along with the linking relation that exists between them. Inspired by Roth and Yih (2004), we employ an integer linear programming approach to solve the joint opinion recognition task, and show that global, constraint-based inference can significantly boost the performance of both relation extraction and the extraction of opinion-related entities. Performance further improves when a semantic role labeling system is incorporated. The resulting system achieves F-measures of 79 and 69 for entity and relation extraction, respectively, improving substantially over prior results in the area. 1
Learning Document-Level Semantic Properties from Free-text Annotations
"... This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often annotated by their authors with free-text keyphrases, such as “a real bargain ” or “good value. ” We leverage these unstructu ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often annotated by their authors with free-text keyphrases, such as “a real bargain ” or “good value. ” We leverage these unstructured annotations by clustering them into semantic properties, and then tying the induced clusters to hidden topics in the document text. This allows us to predict relevant properties of unannotated documents. Our approach is implemented in a hierarchical Bayesian model with joint inference, which increases the robustness of the keyphrase clustering and encourages document topics to correlate with semantically meaningful properties. We perform several evaluations of our model, and find that it substantially outperforms alternative approaches. 1
Language Models for Hierarchical Summarization
, 2003
"... Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language model to characterize the documents that will be summarized and then apply a graph-theoretic algorithm to determine the best topic words for the hierarchical summary. This work is very different from previous attempts to generate topic hierarchies because it relies on statistical analysis and language modeling to identify descriptive words for a document and organize the words in a hierarchical structure. We compare
A system for query-specific document summarization
- In 15th ACM international conference on Information and knowledge management (CIKM
, 2006
"... There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem, which has received little attention. We present a method to create q ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem, which has received little attention. We present a method to create queryspecific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document. In particular, we first add structure to the documents in the preprocessing stage and convert them to document graphs. Then, the best summaries are computed by calculating the top spanning trees on the document graphs. We present and experimentally evaluate efficient algorithms that support computing summaries in interactive time. Furthermore, the quality of our summarization method is compared to current approaches using a user survey.
Using cohesive properties of text for Automatic Summarization
- In JOTRI’02
, 2002
"... A system allowing extractive automatic summarization of textual documents is presented. The system is based on the cohesive properties of text, namely lexical chains, co-reference chains and named entity chains. In this way the system extend the well known lexicalchaining paradigm for summari ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A system allowing extractive automatic summarization of textual documents is presented. The system is based on the cohesive properties of text, namely lexical chains, co-reference chains and named entity chains. In this way the system extend the well known lexicalchaining paradigm for summarization. The system has been applied to summarization tasks on Spanish agency news. Results of its evaluation and comparison with a couple os baseline systems are presented.
Automatic creation of domain templates
, 2006
"... Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning techniques to mine Information Extraction (IE) patterns for capturing information from the input text. Currently, to mine IE patterns one should know in advance t ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning techniques to mine Information Extraction (IE) patterns for capturing information from the input text. Currently, to mine IE patterns one should know in advance the type of the information that should be captured by these patterns. In this work we propose a novel methodology for corpus analysis based on cross-examination of several document collections representing different instances of the same domain. We show that this methodology can be used for automatic domain template creation. As the problem of automatic domain template creation is rather new, there is no well-defined procedure for the evaluation of the domain template quality. Thus, we propose a methodology for identifying what information should be present in the template. Using this information we evaluate the automatically created domain templates through the text snippets retrieved according to the created templates. 1
Automatic Extraction of Knowledge from Web Documents
- Workshop on Human Language Technology for the Semantic Web and Web Services, 2 nd Int. Semantic Web Conf. Sanibel Island
, 2003
"... A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequ ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.
Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation
- 2 nd Int. Conf. Knowledge Capture (KCap'03), Workshop on Knowledge Markup and Semantic Annotation
, 2003
"... The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically extract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to generate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation.

