Results 1 - 10
of
12
Concept Discovery from Text
- In Proceedings of Conference on Computational Linguistics
, 2002
"... WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called commit ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.
Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics
- In Proceedings European Association for Computational Linguistics
, 2001
"... We describe a biographical multidocument summarizer that summarizes information about people described in the news. The summarizer uses corpus statistics along with linguistic knowledge to select and merge descriptions of people from a document collection, removing redundant descriptions. Th ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
We describe a biographical multidocument summarizer that summarizes information about people described in the news. The summarizer uses corpus statistics along with linguistic knowledge to select and merge descriptions of people from a document collection, removing redundant descriptions. The summarization components have been extensively evaluated for coherence, accuracy, and non-redundancy of the descriptions produced.
A Classification Approach to Word Prediction
, 2000
"... The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a fe ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a few new questions that we address. First, in order to learn good word representations it is necessary to use an expressive representation of the context. We present a way that uses external knowledge to generate expressive context representations, along with a learning method capable of handling the large number of features generated this way that can, potentially, contribute to each prediction. Second, since the number of words "competing" for each prediction is large, there is a need to "focus the attention" on a smaller subset of these. We exhibit the contribution of a "focus of attention" mechanism to the performance of the word predictor. Finally, we describe a large scale experimental study in which the approach presented is shown to yield significant improvements in word prediction tasks.
A Sequential Model for Multi-Class Classification. EMNLP ’01
, 2001
"... Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifie ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidates set. Some theoretical and computational properties of the model are discussed and we argue that these are important in NLP-like domains. The advantages of the model are illustrated in an experiment in partof-speech tagging. 1
The disambiguation of nominalizations
- Computational Linguistics
, 2002
"... This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectio ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This article addresses the interpretation of nominalizations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalizations needs to take into account: (a) the selectional constraints imposed by the nominalized compound head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, and (c) the fact that these constraints can be easily overridden by contextual or pragmatic factors. The interpretation of nominalizations poses a further challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. Even an approximation that maps the compound head to its underlying verb provides insufficient evidence. We present an approach that treats the interpretation task as a disambiguation problem and show how we can “re-create” the missing distributional evidence by exploiting partial parsing, smoothing techniques, and contextual information. We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1 % (over a baseline of 61.5%) on the British National Corpus. 1.
On the Effectiveness of the Skew Divergence for Statistical Language Analysis
- In Artificial Intelligence and Statistics 2001
, 2001
"... Estimating word co-occurrence probabilities is a problem underlying many applications in statistical natural language processing. ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Estimating word co-occurrence probabilities is a problem underlying many applications in statistical natural language processing.
The Disambiguation of Nominalisations
- COMPUTATIONAL LINGUISTICS
, 2002
"... This paper addresses the interpretation of nominalisations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalisations needs to take into account: (a) the selectiona ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This paper addresses the interpretation of nominalisations, a particular class of compound nouns whose head noun is derived from a verb and whose modifier is interpreted as an argument of this verb. Any attempt to automatically interpret nominalisations needs to take into account: (a) the selectional constraints imposed by the nominalised compound head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, and (c) the fact that these constraints can be easily overridden by contextual or pragmatic factors. The interpretation of nominalisations poses a further challenge for probabilistic approaches since the argument relations between a head and its modifier are not readily available in the corpus. Even an approximation which maps the compound head to its underlying verb provides insufficient evidence. We present an approach which treats the interpretation task as a disambiguation problem and show how we can "recreate" the missing distributional evidence by exploiting partial parsing, smoothing techniques, and contextual information. We combine these distinct information sources using Ripper, a system that learns sets of rules from data, and achieve an accuracy of 86.1% (over a baseline of 61.5%) on the British National Corpus
P.: Evaluating context features for medical relation mining
- In: ECML/PKDD Workshop on Data
, 2003
"... The paper describes a set of experiments aimed at identifying and evaluating context features and machine learning methods to identify medical semantic relations in texts. We use manually constructed lists of pairs of MeSH-classes that represent specific relations, and a linguistically and semantica ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The paper describes a set of experiments aimed at identifying and evaluating context features and machine learning methods to identify medical semantic relations in texts. We use manually constructed lists of pairs of MeSH-classes that represent specific relations, and a linguistically and semantically annotated corpus of medical abstracts to explore the contextual features of relations. Using hierarchical clustering we compare and evaluate linguistic aspects of relation context and different data representations. Through feature selection on a small data set we also show that relations are characterized by typical context words, and by isolating these we can construct a more robust language model representing the target relation. Finally, we present graph visualization as an alternative and promising way of data representation facilitating feature selection. 1.
Use of a Lexical Feature Database for Partial Parsing of Chinese
"... The Grammatical Knowledge base of Contemporary Chinese contains detailed feature descriptions of the morphological and syntactic behavior of a more than fifty thousand Chinese words. We have provided these features to machine learning modules controlling our partial parsers, and achieved sign ..."
Abstract
- Add to MetaCart
The Grammatical Knowledge base of Contemporary Chinese contains detailed feature descriptions of the morphological and syntactic behavior of a more than fifty thousand Chinese words. We have provided these features to machine learning modules controlling our partial parsers, and achieved significant improvements in performance, at a very modest cost in engineering effort, and under restrictions in the quantity of training data which make it difficult to directly use lexical identities.
Lexical Knowledge Representation with Contexonyms
- In: Proceedings of the 9th Machine Translation
, 2003
"... Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs. ..."
Abstract
- Add to MetaCart
Inter-word associations like stagger - drunken, or intra-word sense divisions (e.g. write a diary vs.

