Results 1  10
of
28
Distinguishing Word Senses in Untagged Text
 In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing
"... This paper describes an experimental com parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. ..."
Abstract

Cited by 75 (17 self)
 Add to MetaCart
(Show Context)
This paper describes an experimental com parison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text.
The Design, Implementation and Use of the Ngram Statistics Package
 In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics
, 2003
"... The Ngram Statistics Package (NSP) is a flexible and easytouse software tool that supports the identification and analysis of Ngrams, sequences of N tokens in online text. We have designed and implemented NSP to be easy to customize to particular problems and yet remain general enough to serve a b ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
(Show Context)
The Ngram Statistics Package (NSP) is a flexible and easytouse software tool that supports the identification and analysis of Ngrams, sequences of N tokens in online text. We have designed and implemented NSP to be easy to customize to particular problems and yet remain general enough to serve a broad range of needs. This paper provides an introduction to NSP while raising some general issues in Ngram analysis, and summarizes several applications where NSP has been successfully employed. NSP is written in Perl and is freely available under the GNU Public License.
Fishing for Exactness
 In Proceedings of the SouthCentral SAS Users Group Conference
, 1996
"... Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of significance. This paper suggests that Fisher's exact test is a more appropriate test due to the skewe ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
(Show Context)
Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of significance. This paper suggests that Fisher's exact test is a more appropriate test due to the skewed and sparse data samples typical of this problem. Both theoretical and experimental comparisons between Fisher's exact test and a variety of asymptotic tests (the ttest, Pearson's chisquare test, and Likelihoodratio chisquare test) are presented. These comparisons show that Fisher's exact test is more reliable in identifying dependent word pairs. The usefulness of Fisher's exact test extends to other problems in statistical natural language processing as skewed and sparse data appears to be the rule in natural language. The experiment presented in this paper was performed using PROC FREQ of the SAS System. Introduction Due to advances in computing power and the increasing availability of l...
Sequential Model Selection for Word Sense Disambiguation
, 1997
"... Statistical models of wordsense disam biguation are often based on a small num ber of contextual features or on a model that is assumed to characterize the inter actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search ..."
Abstract

Cited by 29 (14 self)
 Add to MetaCart
Statistical models of wordsense disam biguation are often based on a small num ber of contextual features or on a model that is assumed to characterize the inter actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search of possible models is conducted in order to find the model that best characterizes the interactions among features. This paper expands existing model selection methodology and presents the first comparative study of model selection search strategies and evaluation criteria when applied to the problem of building probabilistic classifiers for wordsense disambiguation.
A New Supervised Learning Algorithm for Word Sense Disambiguation
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
"... The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of mo ..."
Abstract

Cited by 26 (13 self)
 Add to MetaCart
The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the bestfitting model at each level of model complexity. The Naive Mix utilizes this sequence of models to define a probabilistic model which is then used as a probabilistic classifier to perform wordsense disambiguation. The models in this sequence are restricted to the class of decomposable loglinear models. This class of models offers a number of computational advantages. Experiments disambiguating twelve different words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearestneighbor classif...
The Measure of a Model
, 1996
"... This paper describes measures for evaluating the three determinants of how well a probabilistic classifier performs on a given test set. These determinants are the appropriateness, for the test set, of the results of (1) feature selection, (2) formulation of the parametric form of the model, and (3) ..."
Abstract

Cited by 26 (15 self)
 Add to MetaCart
This paper describes measures for evaluating the three determinants of how well a probabilistic classifier performs on a given test set. These determinants are the appropriateness, for the test set, of the results of (1) feature selection, (2) formulation of the parametric form of the model, and (3) parameter estimation. These are part of any model formulation procedure, even if not broken out as separate steps, so the tradeoffs explored in this paper are relevant to a wide variety of methods. The measures are demonstrated in a large experiment, in which they are used to analyze the results of roughly 300 classifiers that perform wordsense disambiguation. Introduction This paper presents techniques that can be used to analyze the formulation of a probabilistic classifier. As part of this presentation, we apply these techniques to the results of a large number of classifiers, developed using the methodology presented in (2), (3), (4), (5), (12) and (16), which tag words according to ...
On LoglikelihoodRatios and the Significance of Rare Events. EMNLP
, 2004
"... We address the issue of judging the significance of rare events as it typically arises in statistical naturallanguage processing. We first define a general approach to the problem, and we empirically compare results obtained using loglikelihoodratios and Fisher’s exact test, applied to measuring s ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We address the issue of judging the significance of rare events as it typically arises in statistical naturallanguage processing. We first define a general approach to the problem, and we empirically compare results obtained using loglikelihoodratios and Fisher’s exact test, applied to measuring strength of bilingual word associations. 1
Extracting the lowestfrequency words: Pitfalls and possibilities
 Computational Linguistics
, 2000
"... In a medical information extraction system, we use common word association techniques to extract sideeffectrelated terms. Many of these terms have a frequency of less than �ve. Standard wordassociationbased applications disregard the lowestfrequency words, and hence disregard useful information ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
In a medical information extraction system, we use common word association techniques to extract sideeffectrelated terms. Many of these terms have a frequency of less than �ve. Standard wordassociationbased applications disregard the lowestfrequency words, and hence disregard useful information. We therefore devised an extraction system for the full word frequency range. This system computes the signi�cance of association by the loglikelihood ratio and Fisher’s exact test. The output of the system shows a recurrent, corpusindependent pattern in both recall and the number of signi�cant words. We will explain these patterns by the statistical behavior of the lowestfrequency words. We used Dutch verbparticle combinations as a second and independent collocation extraction application to illustrate the generality of the observed phenomena. We will conclude that a) wordassociationbased extraction systems can be enhanced by also considering the lowestfrequency words, b) signi�cance levels should not be �xed but adjusted for the optimal window size, c) hapax legomena, words occurring only once, should be disregarded a priori in the statistical analysis, and d) the distribution of the targets to extract should be considered in combination with the extraction method. 1.
Probabilistic Event Categorization
"... This paper describes the automation of a new text categorization task. The categories assigned in this task are more syntactically, semantically, and contextually complex than those typically assigned by fully automatic systems that process unseen test data. Our system for assigning these cate ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
This paper describes the automation of a new text categorization task. The categories assigned in this task are more syntactically, semantically, and contextually complex than those typically assigned by fully automatic systems that process unseen test data. Our system for assigning these categories uses a probabilistic classifier, developed with a recent method for formulating a probabilistic model from a predefined set of potential features (Bruce 1995, Bruce and Wiebe 1994, Pedersen et al. 1996). This paper focuses on feature selection. It presents various types of properties experimented with in this work. We identify and evaluate various approaches to organizing the collocational properties into features. With the more complex features we define, there is an organization that yields the best results; but the same organization with less complex features yields inferior results. The results suggest a way to take advantage of properties that are low frequency but ...