Results 1 - 10
of
25
Automatic extraction of subcategorization from corpora
- In Proceedings of the 5th ACL Conference on Applied Natural Language Processing
, 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract
-
Cited by 176 (7 self)
- Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Lexical Semantic Techniques for Corpus Analysis
, 1993
"... this paper we outline a research program for computational linguistics, making extensive use of text corpora. We demonstrate how a semantic framework for lexical knowledge can suggest richer relationships among words in text beyond that of simple co-occurrence. The work suggests how linguistic pheno ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
this paper we outline a research program for computational linguistics, making extensive use of text corpora. We demonstrate how a semantic framework for lexical knowledge can suggest richer relationships among words in text beyond that of simple co-occurrence. The work suggests how linguistic phenomena such as metonymy and polysemy might be exploitable for semantic tagging of lexical items. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the automatic construction of predictions about deeper semantic relationships among words appearing in collocational systems. We illustrate the approach for the acquisition of lexical information for several classes of nominals, and how such techniques can fine-tune the lexical structures acquired from an initial seeding of a machine-readable dictionary. In addition to conventional lexical semantic relations, we show how information concerning lexical presuppositions and preference relations can also be acquired from corpora, when analyzed with the appropriate semantic tools. Finally, we discuss the potential that corpus studies have for enriching the data set for theoretical linguistic research, as well as helping to confirm or disconfirm linguistic hypotheses
Term Clustering of Syntactic Phrases
- Proceedings of ACM SIGIR-90
, 1990
"... Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combini ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.
Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations
, 2000
"... We propose a method for identifying diathesis alternations where a particular argument type is seen in slots which have different grammatical roles in the alternating forms. The method uses selectional preferences acquired as probability distributions over WordNet. Preferences for the target slots a ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We propose a method for identifying diathesis alternations where a particular argument type is seen in slots which have different grammatical roles in the alternating forms. The method uses selectional preferences acquired as probability distributions over WordNet. Preferences for the target slots are compared using a measure of distributional similarity. The method is evaluated on the causative and cona- rive alternations, but is generally applicable and does not require a priori knowledge specific to the alternation.
The Representation of Lexical Semantic Information
- University of Sussex
, 1992
"... This thesis is an investigation of the representation of lexical semantic information from a computational linguistic perspective. An implemented representation language is described which is not specic to lexical semantics, but is based on the use of typed feature structures augmented with default ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
This thesis is an investigation of the representation of lexical semantic information from a computational linguistic perspective. An implemented representation language is described which is not specic to lexical semantics, but is based on the use of typed feature structures augmented with default operations. This language, which is formally specied, allows the lexical semantic representations to be tightly integrated with the syntactic component of the lexical sign, capturing generalisations by use of inheritance, while allowing for exceptions with the default mechanism. Default inheritance and default unication are discussed in detail. Grammar rules and lexical rules can be specied in the same formalism and thus the paradigmatic treatment of lexical semantics can be integrated with an account at the syntagmatic level. The use of the language is illustrated with some examples of the representation of verbs, the treatment of logical metonymy and of sense extension. This is followe...
A Large Subcategorization Lexicon for Natural Language Processing Applications
- In Proceedings of LREC
, 2006
"... We introduce a large computational subcategorization lexicon which includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. This extensive lexicon was acquired automatically from five corpora and the Web using the current version of the comprehensive subcategorizatio ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
We introduce a large computational subcategorization lexicon which includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. This extensive lexicon was acquired automatically from five corpora and the Web using the current version of the comprehensive subcategorization acquisition system of Briscoe and Carroll (1997). The lexicon is provided freely for research use, along with a script which can be used to filter and build sub-lexicons suited for different natural language processing (NLP) purposes. Documentation is also provided which explains each sub-lexicon option and evaluates its accuracy. 1.
Lexical Semantics of Adjectives: A Microtheory Of Adjectival Meaning
, 1995
"... . This work belongs to a family of research efforts, called microtheories and aimed at describing the static meaning of all lexical categories in several languages in the framework of the MikroKosmos project on computational semantics. The latter also involves other static microtheories describin ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
. This work belongs to a family of research efforts, called microtheories and aimed at describing the static meaning of all lexical categories in several languages in the framework of the MikroKosmos project on computational semantics. The latter also involves other static microtheories describing world knowledge and syntax-semantics mapping as well as dynamic microtheories connected with the actual process of text analysis. This paper describes our approach to determining and representing adjectival meaning, compares it with the body of knowledge on adjectives in literature and presents a detailed, practically tested methodology and heuristics for the acquisition of lexical entries for adjectives. The work was based on the set of over 6,000 English and about 1,500 Spanish adjectives obtained from task-oriented corpora. Introduction The topic of this paper is the information about adjectival meaning which should be included in a computational lexicon. Thus, we concentrate on...
Lexical Acquisition at the Syntax-Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Preferences.
, 2001
"... Concrete inanimate animate liquid gas plant animal human solid moveable not-moveable Figure 2.4: LDOCE semantic space space by keeping to a simple hierarchy. However, it seems likely that a lot of specific predicates will not be adequately catered for. For example, given the 16 core categories ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Concrete inanimate animate liquid gas plant animal human solid moveable not-moveable Figure 2.4: LDOCE semantic space space by keeping to a simple hierarchy. However, it seems likely that a lot of specific predicates will not be adequately catered for. For example, given the 16 core categories depicted in figure 2.4 the direct object slot of sail would have to be accounted for by the movable class, when a more specific classification would be useful to distinguish, for example, cars, stones and ships. There are now WordNet versions for some European languages other than English (Vossen, 1999). For other languages, producing a new man-made hierarchy is not an easy alternative. The coverage needed for even a restricted domain requires considerable human effort. The noun hyponym hierarchy of WordNet is used as the representation medium for the preferences within this thesis. This makes our preferences prone to the human error inherent in the hierarchy and characteristic of any manmade resource. However, this is to some extent outweighed by the rigorous human effort that has gone into creating this useful taxonomy. WordNet has in excess of 60,000 classes in the hyponym hierarchy with over 88,000 word forms (version 1.5). Using current automatic classification methods for building a hierarchy of reasonable size would require considerable effort in post-editing to avoid incongruous classes and considerable processing time in the first place (Resnik, 1993a). The preferences we obtain are limited to the distinctions made within WordNet. Using corpus data does, to some extent, allow us to obtain preferences for the sublanguage of the corpus, since areas of WordNet that are not relevant to the domain have negligible frequency counts. 2.3 The WordNet Approaches There is a...

