Results 1 - 10
of
18
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract
-
Cited by 631 (19 self)
- Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
Building Probabilistic Models for Natural Language
, 1996
"... Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistic ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistically-trained models are an attractive alternative. These models are generally probabilistic, yielding a score reflecting sentence frequency instead of a binary grammaticality judgement. Probabilistic models of language are a fundamental tool in speech recognition for resolving acoustically ambiguous utterances. For example, we prefer the transcription forbear to four bear as the former string is far more frequent in English text. Probabilistic models also have application in optical character recognition, handwriting recognition, spelling correction, part-of-speech tagging, and machine translation. In this thesis, we investigate three problems involving the probabilistic modeling of languag...
Statistical Foundations for Default Reasoning
, 1993
"... We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and first-order statements. We then assign equal probability to all w ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and first-order statements. We then assign equal probability to all worlds consistent with KB in order to assign a degree of belief to a statement '. The degree of belief can be used to decide whether to defeasibly conclude '. Various natural patterns of reasoning, such as a preference for more specific defaults, indifference to irrelevant information, and the ability to combine independent pieces of evidence, turn out to follow naturally from this technique. Furthermore, our approach is not restricted to default reasoning; it supports a spectrum of reasoning, from quantitative to qualitative. It is also related to other systems for default reasoning. In particular, we show that the work of [ Goldszmidt et al., 1990 ] , which applies maximum entropy ideas t...
From Statistics to Beliefs
, 1992
"... An intelligent agent uses known facts, including statistical knowledge, to assign degrees of belief to assertions it is uncertain about. We investigate three principled techniques for doing this. All three are applications of the principle of indifference, because they assign equal degree of belief ..."
Abstract
-
Cited by 40 (12 self)
- Add to MetaCart
An intelligent agent uses known facts, including statistical knowledge, to assign degrees of belief to assertions it is uncertain about. We investigate three principled techniques for doing this. All three are applications of the principle of indifference, because they assign equal degree of belief to all basic "situations " consistent with the knowledge base. They differ because there are competing intuitions about what the basic situations are. Various natural patterns of reasoning, such as the preference for the most specific statistical data available, turn out to follow from some or all of the techniques. This is an improvement over earlier theories, such as work on direct inference and reference classes, which arbitrarily postulate these patterns without offering any deeper explanations or guarantees of consistency. The three methods we investigate have surprising characterizations: there are connections to the principle of maximum entropy, a principle of maximal independence, an...
Verb Class Disambiguation Using Informative Priors
- COMPUTATIONAL LINGUISTICS
, 2004
"... Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambi ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambiguity. Using this model we are able to generate preferences for ambiguous verbs without the use of a disambiguated corpus. We additionally show that these preferences are useful as priors for a verb sense disambiguator.
Improved smoothing for probabilistic suffix trees seen as variable order Markov chains
- IN EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML
, 2002
"... In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We sh ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We show that the model based on enhanced smoothing outperform the PST while needing less parameters on a protein domain detection task on public databases.
Inductive influence
- British Journal for the Philosophy of Science
"... Objective Bayesianism has been criticised for not allowing learning from experience: it is claimed that an agent must give degree of belief 1 to the next raven being black, however many other black ravens have 2 been observed. I argue that this objection can be overcome by appealing to objective Bay ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Objective Bayesianism has been criticised for not allowing learning from experience: it is claimed that an agent must give degree of belief 1 to the next raven being black, however many other black ravens have 2 been observed. I argue that this objection can be overcome by appealing to objective Bayesian nets, a formalism for representing objective Bayesian degrees of belief. Under this account, previous observations exert an inductive influence on the next observation. I show how this approach can be used to capture the Johnson-Carnap continuum of inductive methods, as well as the Nix-Paris continuum, and show how inductive influence can
Objective Bayesianism with predicate languages. Synthese
, 2008
"... Objective Bayesian probability is often defined over rather simple domains, e.g., finite event spaces or propositional languages. This paper investigates the extension of objective Bayesianism to first-order logical languages. It is argued that the objective Bayesian should choose a probability func ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Objective Bayesian probability is often defined over rather simple domains, e.g., finite event spaces or propositional languages. This paper investigates the extension of objective Bayesianism to first-order logical languages. It is argued that the objective Bayesian should choose a probability function, from all those that satisfy constraints imposed by background knowledge, that is closest to a particular frequency-induced probability function which generalises the λ = 0 function of Carnap’s continuum of inductive methods.
On the Emergence of Reasons in Inductive Logic
- Journal of the IGPL
, 2001
"... We apply methods of abduction derived from propositional probabilistic reasoning to predicate probabilistic reasoning, in particular inductive logic, by treating nite predicate knowledge bases as potentially in nite propositional knowledge bases. It is shown that for a range of predicate knowledg ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We apply methods of abduction derived from propositional probabilistic reasoning to predicate probabilistic reasoning, in particular inductive logic, by treating nite predicate knowledge bases as potentially in nite propositional knowledge bases. It is shown that for a range of predicate knowledge bases (such as those typically associated with inductive reasoning) and several key propositional inference processes (in particular the Maximum Entropy Inference Process) this procedure is well de ned, and furthermore yields an explanation for the validity of the induction in terms of `reasons'. Keywords: Inductive Logic, Probabilistic Reasoning, Abduction, Maximum Entropy, Uncertain Reasoning. 1 Motivation Consider the following situation. I am sitting by a bend in a road and I start to wonder how likely it is that the next car which passes will skid on this bend. I have some knowledge which seems relevant, for example I know that if there is ice on the road then there is a good chance of a skid, and similarly if the bend is unsigned, the camber adverse, etc.. I possibly also have some knowledge of how likely it is that there is ice on the road, how likely it is that the bend is unsigned (possibly conditioned on the iciness of the road) etc.. Notice that this is generic knowledge which applies equally to any potential passing car.
On the Emergence of Reasons in
"... We apply methods of abduction derived from propositional probabilistic reasoning to predicate probabilistic reasoning, in particular inductive logic, by treating finite predicate knowledge bases as potentially infinite propositional knowledge bases. It is shown that for a range of predicate knowled ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We apply methods of abduction derived from propositional probabilistic reasoning to predicate probabilistic reasoning, in particular inductive logic, by treating finite predicate knowledge bases as potentially infinite propositional knowledge bases. It is shown that for a range of predicate knowledge bases (such as those typically associated with inductive reasoning) and several key propositional inference processes (in particular the Maximum Entropy Inference Process) this procedure is well defined, and furthermore yields an explanation for the validity of the induction in terms of `reasons'. Keywords: Inductive Logic, Probabilistic Reasoning, Abduction, Maximum Entropy, Uncertain Reasoning. 1 Motivation Consider the following situation. I am sitting by a bend in a road and I start to wonder how likely it is that the next car which passes will skid on this bend. I have some knowledge which seems relevant, for example I know that if there is ice on the road then there is a good chance of a skid, and similarly if the bend is unsigned, the camber adverse, etc.. I possibly also have some knowledge of how likely it is that there is ice on the road, how likely it is that the bend is unsigned (possibly conditioned on the iciness of the road) etc.. Notice that this is generic knowledge which applies equally to any potential passing car.

