Results 1 - 10
of
116
Bayes Factors
, 1995
"... In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null ..."
Abstract
-
Cited by 717 (65 self)
- Add to MetaCart
In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P -values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications in genetics, sports, ecology, sociology and psychology.
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract
-
Cited by 414 (5 self)
- Add to MetaCart
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Unsupervised word sense disambiguation rivaling supervised methods
- IN PROCEEDINGS OF THE 33RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1995
"... This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have ..."
Abstract
-
Cited by 383 (4 self)
- Add to MetaCart
This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints -- that words tend to have one sense per discourse and one sense per collocation -- exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.
Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora
, 1992
"... This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to ..."
Abstract
-
Cited by 265 (10 self)
- Add to MetaCart
This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to sense distinctions; thus selecting the most likely category provides a useful level of sense disambiguation. The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework. Other
Word Sense Disambiguation Using a Second Language Monolingual Corpus
- Computational Linguistics
, 1994
"... This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of targe ..."
Abstract
-
Cited by 129 (1 self)
- Add to MetaCart
This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of target word selection in machine translation, for which the approach is directly applicable. The presented algorithm identifies syntactic relationships between words, using a source language parser, and maps the alternative interpretations of these relationships to the target language, using a bilingual lexicon. The preferred senses are then selected according to statistics on lexical relations in the target language. The selection is based on a statistical model and on a constraint propagation algorithm, which handles simultaneously all ambiguities in the sentence. The method was evaluated using three sets of Hebrew and German examples and was found to be very useful for disambiguation. The paper includes a detailed comparative analysis of statistical sense disambiguation methods. 1. Introduction The resolution of lexical ambiguities in non-restricted text is one of the most difficult tasks of natural language processing. A related task in machine translation, on which we focus in this paper, is target word selection. This is the task of deciding which target language word is the most appropriate equivalent of a source language word in context. In addition to the alternatives introduced by the different word senses of the source language word, the target language may specify additional alternatives that differ mainly in their usage. Traditionally several linguistic levels were used to deal with this problem: syntactic, semantic and pragmatic. Computationally the syntactic methods...
Decision Lists For Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
, 1994
"... This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and u ..."
Abstract
-
Cited by 126 (3 self)
- Add to MetaCart
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids the problematic complex modeling of statistical dependencies. Although directly applicable to a wide class of ambiguities, the algorithm is described and evaluated in a realistic case study, the problem of restoring missing accents in Spanish and French text. Current accuracy exceeds 99% on the full task, and typically is over 90% for even the most difficult ambiguities.
Introduction to the Special Issue on Computational Linguistics using Large Corpora
- Computational Linguistics
, 1993
"... ..."
Modeling Score Distributions for Combining the Outputs of Search Engines
, 2001
"... In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant docu ..."
Abstract
-
Cited by 72 (4 self)
- Add to MetaCart
In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for not only probabilistic search engines like INQUERY but also vector space search engines like SMART for English. We have also used this model to fit the output of other search engines like LSI search engines and search engines indexing other languages like Chinese. It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all 'good' text search engines operating on any language have similar characteristics. This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques. This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623, in part by the National Science Foundation under grant numbers IRI-9619117 and IIS-9909073, in part by N...
Bayes factors and model uncertainty
- DEPARTMENT OF STATISTICS, UNIVERSITY OFWASHINGTON
, 1993
"... In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null ..."
Abstract
-
Cited by 70 (6 self)
- Add to MetaCart
In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P-values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications. The points we emphasize are:- from Jeffreys's Bayesian point of view, the purpose of hypothesis testing is to evaluate the evidence in favor of a scientific theory;- Bayes factors offer a way of evaluating evidence in favor ofa null hypothesis;- Bayes factors provide a way of incorporating external information into the evaluation of evidence about a hypothesis;- Bayes factors are very general, and do not require alternative models to be nested;- several techniques are available for computing Bayes factors, including asymptotic approximations which are easy to compute using the output from standard packages that maximize likelihoods;- in "non-standard " statistical models that do not satisfy common regularity conditions, it can be technically simpler to calculate Bayes factors than to derive non-Bayesian significance
Using Bilingual Materials to Develop Word Sense Disambiguation Methods
, 1992
"... Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. Following the suggestion in B ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. Following the suggestion in Brown et al. (1991a) and Dagan et al. (1991), we have achieved considerable progress recently by taking advantage of a new source of testing and training materials. Rather than depending on small amounts of hand-labeled text, we have been making use of relatively large amounts of parallel text, text such as the Canadian Hansards (parliamentary debates), which are available in two (or more) languages. The translation can often be used in lieu of hand-labeling. For example, consider the polysemous word sentence, which has two major senses: (1) a judicial sentence, and (2), a syntactic sentence. We can collect a number of sense (1) examples by extracting instances that are translated as peine, and we can collect a number of sense (2) examples by extracting instances that are translated as phrase. In this way, we have been able to acquire a considerable amount of testing and training material for developing and testing our disambiguation algorithms. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 90 % accuracy in discriminating between two very distinct senses of a noun such as

