Results 11 - 20
of
113
Clustering Polysemic Subcategorization Frame Distributions Semantically
- IN PROC. OF THE 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2003
"... Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame (SCF) distributions using the Information Bottleneck and nearest neighbour methods. In ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame (SCF) distributions using the Information Bottleneck and nearest neighbour methods. In contrast to previous work, we particularly focus on clustering polysemic verbs. A novel evaluation scheme is proposed which accounts for the effect of polysemy on the clusters, offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data.
Location Privacy In Ubiquitous Computing
, 2005
"... The field of ubiquitous computing envisages an era when the average consumer owns hundreds or thousands of mobile and embedded computing devices. These devices will perform actions based on the context of their users, and therefore ubiquitous systems will gather, collate and distribute much more per ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
The field of ubiquitous computing envisages an era when the average consumer owns hundreds or thousands of mobile and embedded computing devices. These devices will perform actions based on the context of their users, and therefore ubiquitous systems will gather, collate and distribute much more personal information about individuals than computers do today. Much of this personal information will be considered private, and therefore mechanisms which allow users to control the dissemination of these data are vital. Location information is a particularly useful form of context in ubiquitous computing, yet its unconditional distribution can be very invasive.
A Large Subcategorization Lexicon for Natural Language Processing Applications
- In Proceedings of LREC
, 2006
"... We introduce a large computational subcategorization lexicon which includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. This extensive lexicon was acquired automatically from five corpora and the Web using the current version of the comprehensive subcategorizatio ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
We introduce a large computational subcategorization lexicon which includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. This extensive lexicon was acquired automatically from five corpora and the Web using the current version of the comprehensive subcategorization acquisition system of Briscoe and Carroll (1997). The lexicon is provided freely for research use, along with a script which can be used to filter and build sub-lexicons suited for different natural language processing (NLP) purposes. Documentation is also provided which explains each sub-lexicon option and evaluates its accuracy. 1.
2006b. Models for sentence compression: A comparison across domains, training requirements and evaluation measures
- In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
"... Sentence compression is the task of producing a summary at the sentence level. This paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation. We provide a novel comparison between a supervis ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Sentence compression is the task of producing a summary at the sentence level. This paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation. We provide a novel comparison between a supervised constituentbased and an weakly supervised wordbased compression algorithm and examine how these models port to different domains (written vs. spoken text). To achieve this, a human-authored compression corpus has been created and our study highlights potential problems with the automatically gathered compression corpora currently used. Finally, we assess whether automatic evaluation measures can be used to determine compression quality. 1
Object-Extraction and Question-Parsing Using CCG
- In Proceedings of the EMNLP Conference
, 2004
"... Accurate dependency recovery has recently been reported for a number of wide-coverage statistical parsers using Combinatory Categorial Grammar (CCG). However, overall figures give no indication of a parser’s performance on specific constructions, nor how suitable a parser is for specific application ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Accurate dependency recovery has recently been reported for a number of wide-coverage statistical parsers using Combinatory Categorial Grammar (CCG). However, overall figures give no indication of a parser’s performance on specific constructions, nor how suitable a parser is for specific applications. In this paper we give a detailed evaluation of a CCG parser on object extraction dependencies found in WSJ text. We also show how the parser can be used to parse questions for Question Answering. The accuracy of the original parser on questions is very poor, and we propose a novel technique for porting the parser to a new domain, by creating new labelled data at the lexical category level only. Using a supertagger to assign categories to words, trained on the new data, leads to a dramatic increase in question parsing accuracy. 1
Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance
- In: Proceedings of COLING-ACL
, 2006
"... Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford D ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford Dictionary of English. We assess the quality of the mapping and the induced clustering, and evaluate the performance of coarse WSD systems in the Senseval-3 English all-words task. 1
Extended Lexical-Semantic Classification of English Verbs
- IN PROCEEDINGS OF THE HLT/NAACL WORKSHOP ON COMPUTATIONAL LEXICAL SEMANTICS
, 2004
"... Lexical-semantic verb classifications have proved useful in supporting various natural language processing (NLP) tasks. The largest and the most widely deployed classification in English is Levin's (1993) taxonomy of verbs and their classes. While this resource is attractive in being extensive ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Lexical-semantic verb classifications have proved useful in supporting various natural language processing (NLP) tasks. The largest and the most widely deployed classification in English is Levin's (1993) taxonomy of verbs and their classes. While this resource is attractive in being extensive enough for some NLP use, it is not comprehensive. In this paper, we present a substantial extension to Levin's taxonomy which incorporates 57 novel classes for verbs not covered (comprehensively) by Levin. We also
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
Constraint-based sentence compression: An integer programming approach
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL
, 2006
"... The ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. Our work views sentence compression as an optimisation problem. We develop an integer programming formulation and infer globally optimal compressions in the face of ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. Our work views sentence compression as an optimisation problem. We develop an integer programming formulation and infer globally optimal compressions in the face of linguistically motivated constraints. We show that such a formulation allows for relatively simple and knowledge-lean compression models that do not require parallel corpora or largescale resources. The proposed approach yields results comparable and in some cases superior to state-of-the-art. 1
Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank
- In Proceedings of the Poster Session of COLING/ACL-06
, 2006
"... We evaluate the accuracy of an unlexicalized statistical parser, trained on 4K treebanked sentences from balanced data and tested on the PARC DepBank. We demonstrate that a parser which is competitive in accuracy (without sacrificing processing speed) can be quickly tuned without reliance on large i ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We evaluate the accuracy of an unlexicalized statistical parser, trained on 4K treebanked sentences from balanced data and tested on the PARC DepBank. We demonstrate that a parser which is competitive in accuracy (without sacrificing processing speed) can be quickly tuned without reliance on large in-domain manuallyconstructed treebanks. This makes it more practical to use statistical parsers in applications that need access to aspects of predicate-argument structure. The comparison of systems using DepBank is not straightforward, so we extend and validate DepBank and highlight a number of representation and scoring issues for relational evaluation schemes. 1

