Results 1 - 10
of
30
Noun Homograph Disambiguation Using Local Context in Large Text Corpora
- University of Waterloo
, 1991
"... This paper describes an accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora. The algorithm checks the context surrounding the target noun against that of previously observed instances and chooses the sense for which the most evidence is found, w ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
This paper describes an accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora. The algorithm checks the context surrounding the target noun against that of previously observed instances and chooses the sense for which the most evidence is found, where evidence consists of a set of orthographic, syntactic, and lexical features. Because the sense distinctions made are coarse, the disambiguation can be accomplished without the expense of knowledge bases or inference mechanisms. An implementation of the algorithm is described which, starting with a small set of hand-labeled instances, improves its results automatically via unsupervised training. The approach is compared to other attempts at homograph disambiguation using both machine readable dictionaries and unrestricted text and the use of training instances is determined to be a crucial difference. 1 Introduction Large text corpora and the computational resources to handle them have ...
Internal and External Evidence in the Identification and Semantic Categorization of Proper Names
- Corpus Processing for Lexical Acquisition
, 1996
"... We describe the proper name recognition and classification facility ("PNF") of the SPARSER natural language understanding system. PNF has been used very successfully in the analysis of unrestricted texts in several sublanguages taken from online news sources. It makes its categorizations on the b ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
We describe the proper name recognition and classification facility ("PNF") of the SPARSER natural language understanding system. PNF has been used very successfully in the analysis of unrestricted texts in several sublanguages taken from online news sources. It makes its categorizations on the basis of 'external' evidence from the context of the phrases adjacent to the name as well as 'internal' evidence within the sequence of words and characters. A semantic model of each name and its components is maintained and used for subsequent reference.
Text Generation in a Dynamic Hypertext Environment
- In Proceedings of the 19th Australasian Computer Science Conference
, 1996
"... This paper describes PEBA-II, a working natural language generation system which interactively describes animals in a taxonomic knowledge base via the production of World Wide Web pages. Our aim is to construct a natural language document generation system with real practical applicability: to this ..."
Abstract
-
Cited by 50 (12 self)
- Add to MetaCart
This paper describes PEBA-II, a working natural language generation system which interactively describes animals in a taxonomic knowledge base via the production of World Wide Web pages. Our aim is to construct a natural language document generation system with real practical applicability: to this end, the system reconstructs and combines a number of existing ideas in the literature in a novel way, and proposes a solution to the problem of breadth of coverage that is based on a pragmatic approach to knowledge representation and linguistic realisation. The system embodies the following features: ffl a reconstruction of some of the core ideas in schema--based text generation [McKeown 1985], applied to the generation of hypertext documents; ffl the principled use of a phrasal lexicon to ease surface generation, in concert with a knowledge base whose elements may correspond to pre--compiled collections of atomic units; ffl a user model and discourse model that permit interesting varia...
Bilexical Grammars And Their Cubic-Time Parsing Algorithms
- IN: NEW DEVELOPMENTS IN NATURAL LANGUAGE PARSING
, 2000
"... This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism ’ has been a theme of much current work in parsing. The new formalism can be used t ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism ’ has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modification yields link grammars. Its scoring approach is compatible with a wide variety of probability models. The obvious parsing algorithm for bilexical grammars (used by most previous authors) takes time O(n^5). A more efficient O(n³) method is exhibited. The new algorithm has been implemented and used in a large parsing experiment (Eisner, 1996b). We also give a useful extension to the case where the parser must undo a stochastic transduction that has altered the input.
Phred: A Generator For Natural Language Interfaces
- Computational Linguistics
, 1985
"... this paper is similar to the unification procedure in TELEGRAM (Appelt 1983), which employs a unification gram- mar ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
this paper is similar to the unification procedure in TELEGRAM (Appelt 1983), which employs a unification gram- mar
An Approach to Natural Language Processing for Document Retrieval
- In Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-87
, 1996
"... Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem - the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on "conceptual case frames" and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative import...
wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web
- COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
Knowledge resource tools for accessing large text files
- In Machine Translation: Theoretical and Methodological Issues. Sergei Nirenberg
, 1987
"... This paper provides an overview of a research program just being defined at Bellcore. The objective is to develop facilities for working with large document collections that provide more refined access to the information contained in these "source " materials than is possible through curre ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper provides an overview of a research program just being defined at Bellcore. The objective is to develop facilities for working with large document collections that provide more refined access to the information contained in these "source " materials than is possible through current information retrieval procedures. The tools being used for this purpose are machine-readable dictionaries, encyclopedias, and related "resources " that provide geographical, biographical, and other kinds of specialized knowledge. A major feature of the research program is the exploitation of the reciprocal relationships between sources and resources. These interactions between texts and tools are intended to support experts who organize and use information in a workstation environment. Two systems under development will be described to illustrate the approach: one providing capabilities for full-text subject assessment; the other for concept elaboration while reading text. Progress in the research depends critically on developments in artificial intelligence, computational linguistics, and information science to provide a scientific base, and on software engineering, database management, and distributed systems to provide the technology. 1.
Skimming Newspaper Stories by Computer
, 1977
"... Program) is a system being developed at Yale to skim newspaper stories. The United Press International news service has recently been connected ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Program) is a system being developed at Yale to skim newspaper stories. The United Press International news service has recently been connected

