Results 1 - 10
of
26
Subtopic Structuring for Full-Length Document Access
, 1993
"... We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, ..."
Abstract
-
Cited by 169 (8 self)
- Add to MetaCart
We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, a partition of t'he text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local st'ructure achieves better results on a typical information retrieval task than does a standard IR measure.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Generating Summaries of Multiple News Articles
- In Proceedings, 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1995
"... So That Nobody Has To Go To School If They Don't Want To by Roger Sipher A decline in standardized test scores is but the most recent indicator that American education is in trouble. One reason for the crisis is that present mandatory-attendance laws force many to attend school who have no wish to b ..."
Abstract
-
Cited by 91 (12 self)
- Add to MetaCart
So That Nobody Has To Go To School If They Don't Want To by Roger Sipher A decline in standardized test scores is but the most recent indicator that American education is in trouble. One reason for the crisis is that present mandatory-attendance laws force many to attend school who have no wish to be there. Such children have little desire to learn and are so antagonistic to school that neither they nor more highly motivated students receive the quality education that is the birthright of every American. The solution to this problem is simple: Abolish compulsory-attendance laws and allow only those who are committed to getting an education to attend. This will not end public education. Contrary to conventional belief, legislators enacted compulsory-attendance laws to legalize what already existed. William Landes and Lewis Solomon, economists, found little evidence that mandatory-attendance laws increased the number of children in school. They found, too, that school systems have never effectively enforced such laws, usually because of the expense involved. There is no contradiction between the assertion that compulsory attendance has had little effect on the number of children attending school and the argument that repeal would be a positive step toward improving education. Most parents want a high school education for their children. Unfortunately, compulsory attendance hampers the ability of public school officials to enforce legitimate educational and disciplinary policies and thereby make the education a good one. Private schools have no such problem. They can fail or dismiss students, knowing such students can attend public school. Without compulsory attendance, public schools would be freer to oust students whose academic or personal behavior undermines the educational mission of the institution. Has not the noble experiment of a formal education for everyone failed? While we pay homage to the homily, "You can lead a horse to water but you can't make him drink," we have pretended it is not true in education. Ask high school teachers if recalcitrant students learn anything of value. Ask teachers if these students do any homework. Quite the contrary, these students know they will be passed from grade to grade until they are old enough to quit or until, as is more likely, they receive a high school diploma. At the point when students could legally quit, most choose to remain since they know they are likely to be allowed to graduate whether they do acceptable work or not. Abolition of archaic attendance laws would produce enormous dividends. First, it would alert everyone that school is a serious place where one goes to learn. Schools are neither day-care centers nor indoor street corners. Young people who resist learning should stay away; indeed, an end to compulsory schooling would require them to stay away. Second, students opposed to learning would not be able to pollute the educational atmosphere for those who want to learn. Teachers could stop policing recalcitrant students and start educating. Third, grades would show what they are supposed to: how well a student is learning. Parents could again read report cards and know if their children were making progress. Fourth, public esteem for schools would increase. People would stop regarding them as way stations for adolescents and start thinking of them as institutions for educating America's youth. Fifth, elementary schools would change because students would find out early they had better learn something or risk flunking out later. Elementary teachers would no longer have to pass their failures on to junior high and high school. Sixth, the cost of enforcing compulsory education would be eliminated. Despite enforcement efforts, nearly 15 percent of the school-age children in our largest cities are almost permanently absent from school. Communities could use these savings to support institutions to deal with young people not in school. If, in the long run, these institutions prove more costly, at least we would not confuse their mission with that of schools. Schools should be for education. At present, they are only tangentially so. They have attempted to serve an all-encompassing social function, trying to be all things to all people. In the process they have failed miserably at what they were originally formed to accomplish.
Natural Language Processing for Information Retrieval
, 1996
"... The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing. This paper will appear in Communications of the ACM. 2 Introduction Automatic text, or document, retrieval has recently become a topic of interest for those working in natural language processing (NLP). The aim of this article is to indicate the key properties of document retrieval, distinguishing it from both data retrieval and question answering; to summarize past exper...
What Might Be in a Summary?
- Information Retrieval 93: Von der Modellierung zur Anwendung
, 1993
"... The paper presents a framework for, and strategies adopted in, an investigation of summarising designed to place future work on automatic summarising on solid foundations. The work reported has been focused on the role of large-scale text structure, and the paper describes comparative studies of dif ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
The paper presents a framework for, and strategies adopted in, an investigation of summarising designed to place future work on automatic summarising on solid foundations. The work reported has been focused on the role of large-scale text structure, and the paper describes comparative studies of different approaches to the characterisation of source text structure and to the use of this structure in summary formation. 2 Introduction In this paper I shall describe some foundationally-motivated work we have been doing in Cambridge, establishing and exploiting a framework for automatic summarising. I shall introduce this account by noting some relations with indexing, and in conclusion return to the connection between the two forms of information capture and use. We can all, and do, summarise; and we use other people's summaries. Summarising, especially in the guise of extracting, was an early task for automation (Luhn, 1958). But not much progress has in fact been made with automatic su...
Generating Indicative-Informative Summaries with SumUM
- Computational Linguistics
, 2002
"... We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the r ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader’s interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies. 1.
Cohesion and Collocation: Using Context Vectors in Text Segmentation
- In Proceedings of the 37th Annual Meeting of the Association of for computational Linguistics (Student Session
, 1999
"... Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTi ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTile system, produces similarity curves over.texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). I Background The notion of text cohesion rests on the intuition that a text is "held together" by a variety of internal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan (1976), where co- hesion is defined as a network of relationships be- tween locations in the text, arising from (i) grammatical factors (co-reference, use of pro-forms, ellipsis and sentential connectives), and (ii) lexical factors (reiteration and collocation). Subsequent work has further developed this taxonomy (Hoey, 1991) and explored its implications in such areas as paragraphing (Longacre, 1979; Bond and Hayes, 1984; Stark, 1988), relevance (Sperber and Wilson, 1995) and discourse structure (Grosz and Sidner, 1986).
Domain-specific informative and indicative summarization for information retrieval
- Proc. of the Document Understanding Conference (DUC
, 2001
"... retrieval ..."

