Results 1 - 10
of
39
Automatic Word Sense Discrimination
- Journal of Computational Linguistics
, 1998
"... This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closen ..."
Abstract
-
Cited by 272 (0 self)
- Add to MetaCart
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words
Corpus-based induction of syntactic structure: Models of dependency and constituency
- In Proceedings of the 42nd Annual Meeting of the ACL
, 2004
"... We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best pu ..."
Abstract
-
Cited by 128 (8 self)
- Add to MetaCart
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically, being able to exploit either attachment or distributional regularities that are salient in the data. 1
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Distributional Information: A Powerful Cue for Acquiring Syntactic Categories
- COGNITIVE SCIENCE
, 1998
"... Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, p ..."
Abstract
-
Cited by 85 (2 self)
- Add to MetaCart
Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, psychologically plausible mechanisms. We present a range of results using a large corpus of child-directed speech and explore their psychological implications. While our results show that a considerable amount of information concerning the syntac-tic categories can be obtained from distributional information alone, we stress that many other sources of information may also be potential contributors to the identification of syntactic classes.
Distributional Part-of-Speech Tagging
- In Proc. of 7th Conference of the European Chapter of the Association for Computational Linguistics
, 1995
"... This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
A Generative Constituent-Context Model for Improved Grammar Induction
, 2002
"... We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts.
Statistical methods and linguistics
- THE BALANCING ACT: COMBINING SYMBOLIC AND STATISTICAL APPROACHES TO LANGUAGE
, 1996
"... In the space of the last ten years, statistical methods have gone from being virtually unknown in computational linguistics to being a fundamental given. In 1996, no one can profess to be a computational linguist without a passing knowledge of statistical methods. HMM's are as de rigeur as LR tables ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
In the space of the last ten years, statistical methods have gone from being virtually unknown in computational linguistics to being a fundamental given. In 1996, no one can profess to be a computational linguist without a passing knowledge of statistical methods. HMM's are as de rigeur as LR tables, and anyone who cannot at least use the terminology persuasively risks being mistaken for kitchen help at the ACL banquet. More seriously, statistical techniques have brought signi cant advances in broad-coverage language processing. Statistical methods have made real progress possible on a number of issues that had previously stymied attempts to liberate systems from toy domains � issues that include disambiguation, error correction, and the induction of the sheer volume of information requisite for handling unrestricted text. And the sense of progress has generated a great deal of enthusiasm for statistical methods in computational linguistics. However, this enthusiasm has not been catching in linguistics proper. It is always dangerous to generalize about linguists, but I think it is fair to say
Visualizing Association Rules for Text Mining
, 1999
"... An association rule in data mining is an implication of the form X Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them c ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
An association rule in data mining is an implication of the form X Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available. This paper presents a novel visualization technique to tackle many of these problems. We apply the technology to a text mining study on large corpora. The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping. Keywords: Text Visualization, Information Visualization, Text Mining, Data Mining, Association Rule 1 INTRODUCTION Association...
The Automated Evaluation of Inferred Word Classifications
- In Eleventh European Conference on Artificial Intelligence
, 1994
"... . Although automatically inferring classifications of words has been attempted by many researchers recently, no formal attempts to evaluate their results were made. Instead they relied on a looks good to me intuitive self-evaluation. We outline a method by which automated word classification techni ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
. Although automatically inferring classifications of words has been attempted by many researchers recently, no formal attempts to evaluate their results were made. Instead they relied on a looks good to me intuitive self-evaluation. We outline a method by which automated word classification techniques can be fairly compared. The process by which words are automatically grouped into classes involves a number of decision points. The experiments selected a set of options for many of the decision points and rated each combination of the factors so that the most successful approach can be found. We directly compare some of the adopted approaches of other researchers with the set of factors that were found to produce the most linguistically plausible classification in our experiments. The evaluation method is also shown to be a valuable aid to highlighting approaches that are inefficient. 1 Hierarchical Clustering to Cluster Words Hierarchical clustering is a way to produce a taxonomic cl...
The Direct Route: Mediated Priming in Semantic Space
, 2000
"... McKoon and Ratcliff (1992) presented a theory of mediated priming where the priming effect is due to a direct but weak relatedness between prime and target. They also introduced a quantitative measure of word relatedness based on pointwise mutual information (Church and Hanks, 1990), and showed ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
McKoon and Ratcliff (1992) presented a theory of mediated priming where the priming effect is due to a direct but weak relatedness between prime and target. They also introduced a quantitative measure of word relatedness based on pointwise mutual information (Church and Hanks, 1990), and showed that stimuli chosen with the measure produced graded priming effects as predicted by their theory. Using stimuli from Balota and Lorch (1986), Livesay and Burgess (1998a,b) replicated the mediated priming effect in humans, but found that in HAL, a corpus-derived semantic space (Lund et al., 1995), mediated primes were in fact further from their targets than unrelated words. They concluded from this that mediated priming is not due to direct but weak relatedness. In this paper we present an alternative semantic space model based on earlier work (McDonald and Lowe, 1998). We show how this space allows a) a detailed replication of Ratcliff and McKoon's experimental results using the...

