Results 1 - 10
of
60
A Trainable Document Summarizer
, 1995
"... To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary. ..."
Abstract
-
Cited by 342 (2 self)
- Add to MetaCart
To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary.
A Comparison of Two Learning Algorithms for Text Categorization
- In Third Annual Symposium on Document Analysis and Information Retrieval
, 1994
"... This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has m ..."
Abstract
-
Cited by 239 (1 self)
- Add to MetaCart
This paper examines the use of inductive learning to categorize natural language documents into predefined content categories. Categorization of text is of increasing importance in information retrieval and natural language processing systems. Previous research on automated text categorization has mixed machine learning and knowledge engineering methods, making it difficult to draw conclusions about the performance of particular methods. In this paper we present empirical results on the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets. We find that both algorithms achieve reasonable performance and allow controlled tradeoffs between false positives and false negatives. The stepwise feature selection in the decision tree algorithm is particularly effective in dealing with the large feature sets common in text categorization. However, even this algorithm is aided by an initial prefiltering of features, confirming the results...
A Neural Network Approach to Topic Spotting
, 1995
"... This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to d ..."
Abstract
-
Cited by 134 (1 self)
- Add to MetaCart
This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to dimensionality reduction in representation: one based on term selection and another based on Latent Semantic Indexing (LSI). Two different methods are proposed for improving LSI representations for the topic spotting task. We find that term selection and our modified LSI representations lead to similar topic spotting performance, and that this performance is equal to or better than other published results on the same corpus. 1 Introduction Topic spotting is the problem of identifying which of a set of predefined topics are present in a natural language document. More formally, given a set of n topics and a document, the task is to output for each topic the probability that the topic is prese...
Automated Text Summarization in SUMMARIST
, 1999
"... SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems ..."
Abstract
-
Cited by 112 (10 self)
- Add to MetaCart
SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems architecture and provide details of some of its modules.
Advantages of query biased summaries in information retrieval
- In Proceedings of ACM SIGIR
, 1998
"... www.dcs.gla.ac.uk/-tombrosa / www-ciir.cs.umass.edu/-sanderso/ Abstract This paper presents an investigation into the utility of document summarisation in the context of information retrieval, more specifically in the application of so called query biased (or user directed) summaries: summaries cust ..."
Abstract
-
Cited by 89 (6 self)
- Add to MetaCart
www.dcs.gla.ac.uk/-tombrosa / www-ciir.cs.umass.edu/-sanderso/ Abstract This paper presents an investigation into the utility of document summarisation in the context of information retrieval, more specifically in the application of so called query biased (or user directed) summaries: summaries customised to reflect the information need expressed in a query. Employed in the retrieved document list displayed after a retrieval took place, the summaries ’ utility was evaluated in a task-based environment by measuring users ’ speed and accuracy in identifying relevant documents. This was compared to the performance achieved when users were presented with the more typical output of an IR system: a static predefined summary composed of the title and first few sentences of retrieved documents. The results from the evaluation indicate that the use of query biased summaries significantly improves both the accuracy and speed of user relevance judgements. 1
Natural Language Processing for Information Retrieval
, 1996
"... The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing. This paper will appear in Communications of the ACM. 2 Introduction Automatic text, or document, retrieval has recently become a topic of interest for those working in natural language processing (NLP). The aim of this article is to indicate the key properties of document retrieval, distinguishing it from both data retrieval and question answering; to summarize past exper...
Evaluating Text Categorization
- In Proceedings of Speech and Natural Language Workshop
, 1991
"... While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of tec ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of techniques for text categorization difficult and has disguised important research issues. In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluating either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods. INTRODUCTION Text classification systems, i.e. systems which can make distinctions between meaningful classes of texts, have ...
Customizing a Lexicon to Better Suit a Computational Task
- Proc. of the Workshop on Extracting Lexical Knowledge
, 1996
"... We discuss a method for augmenting and rearranging a structured lexicon in order to make it more suitable for a topic labeling task, by making use of lexical association information from a large text corpus. We first describe an algorithm for converting the hierarchical structure of WordNet [13] ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
We discuss a method for augmenting and rearranging a structured lexicon in order to make it more suitable for a topic labeling task, by making use of lexical association information from a large text corpus. We first describe an algorithm for converting the hierarchical structure of WordNet [13] into a set of fiat categories. We then use lexical cooccurrence statistics in combination with these categories to classify proper names, assign more specific senses to broadly defined terms, and classify new words into existing categories. We also describe how to use these statistics to assign schema-like information to the categories and show how the new categories improve a text-labeling algorithm. In effect, we provide a mechanism for successfully combining a hand-built lexicon with knowledge-free, statistically-derived information.
Information Extraction: Beyond Document Retrieval
- COMPUTATIONAL LINGUISTICS AND CHINESE LANGUAGE PROCESSING
, 1998
"... In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960's and 70's till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.
Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface
, 1995
"... In recent years, interface agents have been developed to assist users with various tasks. Some systems employ machine learning techniques to allow the agent to adapt to the user's changing requirements. With the increase in the volume of data on the Internet, agents have emerged which are able to mo ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
In recent years, interface agents have been developed to assist users with various tasks. Some systems employ machine learning techniques to allow the agent to adapt to the user's changing requirements. With the increase in the volume of data on the Internet, agents have emerged which are able to monitor and learn from their users to identify topics of interest. One such agent, described here, has been developed to filter mail messages. We examine the issues involved in constructing an autonomous interface agent which employs a learning component, and explore the use of two different learning techniques in this context. Submitted to Applied Artificial Intelligence Journal. October 26, 1 INTRODUCTION 1 1 Introduction Agents were once seen as anthropomorphic entities which would assist users with daily tasks. They could be used, for example, to locate information of interest to their user (Kay 1984). Ten years later, many definitions of agents have been proposed. The basic concept of ...

