Results 1 - 10
of
55
A Trainable Document Summarizer
, 1995
"... To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary. ..."
Abstract
-
Cited by 342 (2 self)
- Add to MetaCart
To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary.
A Comparison of Two Learning Algorithms for Text Categorization
- In Third Annual Symposium on Document Analysis and Information Retrieval
, 1994
"... This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has m ..."
Abstract
-
Cited by 239 (1 self)
- Add to MetaCart
This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it difficult to draw conclusions about the performance of particular methods. In this paper we present empirical results on the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets. We find that both algorithms achieve reasonable performance and allow controlled tradeoffs between false positives and false negatives. The stepwise feature selection in the decision tree algorithm is particularly effective in dealing with the large feature sets common in text categorization. However, even this algorithm is aided by an initial prefiltering of features, confirming the results...
Evaluating Natural Language Processing Systems
, 1993
"... This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to ev ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to evaluation in speech processing. Part 2 surveys significant evaluation work done so far, for instance in machine translation, and discusses the particular problems of generic system evaluation. The conclusion is that evaluation strategies and techniques for NLP need much more development, in particular to take proper account of the influence of system tasks and settings. Part 3 develops a general approach to NLP evaluation, aimed at methodologically-sound strategies for test and evaluation motivated by comprehensive performance factor identification. The analysis throughout the report is supported by extensive illustrative examples. This work was carried out under the UK Science and Engineeri...
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains
- Artificial Intelligence
, 1996
"... this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonst ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog. 1 Introduction Portability is a crucial concern for researchers in knowledge-based natural language processing (NLP). Knowledge-based NLP systems typically rely on a conceptual dictionary that has been manually encoded for a specific domain. Although knowledge-based systems have performed well on certain tasks (e.g., [2,4,5,11,16,23]), these systems will not be practical for real world applications until the knowledge that they need can be acquired automatically. Preprint submitted to Elsevier Preprint 21 March We have developed a system called AutoSlog that generates conceptual dictionaries for information extraction automatically. Information extraction (IE) is essentially a form of text skimming, in which specific types of information are extracted from text. There has been a lot of work recently on information extraction in conjunction with the recent message understanding conferences [26--28]. Most information extraction systems rely on a manually encoded dictionary of extraction patterns (e.g., see [12,15,1]). Using AutoSlog, the UMass/MUC-4 system was the first system that could acquire domainspecific extraction patterns automatically [17,18]. In previous work, we showed that AutoSlog could create effective extraction patterns for the domain of terrorism [30]. A dictionary generated by AutoSlog for the terrorism domain achieved 98% of the performance of a handcrafted dictionary that required a...
The Automated Acquisition of Topic Signatures for Text Summarization
- Proc. Of the COLING Conference
, 2000
"... In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created wi ..."
Abstract
-
Cited by 69 (7 self)
- Add to MetaCart
In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created with 6,194 TREC collection texts over 4 selected topics. We describe the possible integration of topic signatures with ontologies and its evaluaton on an automated text summarization system. 1 Introduction This paper describes the automated creation of what we call topic signatures, constructs that can play a central role in automated text summarization and information retrieval. Topic signatures can be used to identify the presence of a complex concept|a concept that consists of several related components in xed relationships. Restaurant-visit, for example, involves at least the concepts menu, eat, pay, and possibly waiter, and Dragon Boat Festival (in Taiwan) involves the concepts calamus...
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
, 2004
"... We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear.
Information Extraction: Beyond Document Retrieval
- COMPUTATIONAL LINGUISTICS AND CHINESE LANGUAGE PROCESSING
, 1998
"... In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960's and 70's till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.
Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries
- In SIGIR99
, 1999
"... Using current extractive summarization techniques, it is impossible to produce a coherent document summary shorter than a single sentence, or to produce a summary that conforms to particular stylistic constraints. Ideally, one would prefer to understand the document, and to generate an appropriate s ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Using current extractive summarization techniques, it is impossible to produce a coherent document summary shorter than a single sentence, or to produce a summary that conforms to particular stylistic constraints. Ideally, one would prefer to understand the document, and to generate an appropriate summary directly from the results of that understanding. Absent a comprehensive natural language understanding system, an approximation must be used. This paper presents an alternative statistical model of a summarization process, which jointly applies statistical models of the term selection and term ordering process to produce brief coherent summaries in a style learned from a training corpus. 1 Introduction Summarization is one of the most important capabilities required in writing. Effective summarization, like effective writing, is neither easy nor innate; rather, it is a skill that is developed through instruction and practice [Hidi and Anderson, 1986; Hooper et al., 1994] . Generating...
Headline Generation Based on Statistical Translation
- In Proceedings of ACL-2000
"... Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more pr ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more practical approach to this problem results in the use of an approximation: viewing summarization as a problem analogous to statistical machine translation.

