Results 1  10
of
24
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract

Cited by 850 (20 self)
 Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and ngram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the crossentropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of JelinekMercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
Statistical Foundations for Default Reasoning
, 1993
"... We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and firstorder statements. We then assign equal probability to all w ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
We describe a new approach to default reasoning, based on a principle of indifference among possible worlds. We interpret default rules as extreme statistical statements, thus obtaining a knowledge base KB comprised of statistical and firstorder statements. We then assign equal probability to all worlds consistent with KB in order to assign a degree of belief to a statement '. The degree of belief can be used to decide whether to defeasibly conclude '. Various natural patterns of reasoning, such as a preference for more specific defaults, indifference to irrelevant information, and the ability to combine independent pieces of evidence, turn out to follow naturally from this technique. Furthermore, our approach is not restricted to default reasoning; it supports a spectrum of reasoning, from quantitative to qualitative. It is also related to other systems for default reasoning. In particular, we show that the work of [ Goldszmidt et al., 1990 ] , which applies maximum entropy ideas t...
From Statistics to Beliefs
, 1992
"... An intelligent agent uses known facts, including statistical knowledge, to assign degrees of belief to assertions it is uncertain about. We investigate three principled techniques for doing this. All three are applications of the principle of indifference, because they assign equal degree of belief ..."
Abstract

Cited by 43 (12 self)
 Add to MetaCart
An intelligent agent uses known facts, including statistical knowledge, to assign degrees of belief to assertions it is uncertain about. We investigate three principled techniques for doing this. All three are applications of the principle of indifference, because they assign equal degree of belief to all basic "situations " consistent with the knowledge base. They differ because there are competing intuitions about what the basic situations are. Various natural patterns of reasoning, such as the preference for the most specific statistical data available, turn out to follow from some or all of the techniques. This is an improvement over earlier theories, such as work on direct inference and reference classes, which arbitrarily postulate these patterns without offering any deeper explanations or guarantees of consistency. The three methods we investigate have surprising characterizations: there are connections to the principle of maximum entropy, a principle of maximal independence, an...
Verb Class Disambiguation Using Informative Priors
 COMPUTATIONAL LINGUISTICS
, 2004
"... Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambi ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
Levin’s (1993) study of verb classes is a widely used resource for lexical semantics. In her framework, some verbs, such as give, exhibit no class ambiguity. But other verbs, such as write, have several alternative classes. We extend Levin’s inventory to a simple statistical model of verb class ambiguity. Using this model we are able to generate preferences for ambiguous verbs without the use of a disambiguated corpus. We additionally show that these preferences are useful as priors for a verb sense disambiguator.
Inductive influence
 British Journal for the Philosophy of Science
"... Objective Bayesianism has been criticised for not allowing learning from experience: it is claimed that an agent must give degree of belief 1 to the next raven being black, however many other black ravens have 2 been observed. I argue that this objection can be overcome by appealing to objective Bay ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Objective Bayesianism has been criticised for not allowing learning from experience: it is claimed that an agent must give degree of belief 1 to the next raven being black, however many other black ravens have 2 been observed. I argue that this objection can be overcome by appealing to objective Bayesian nets, a formalism for representing objective Bayesian degrees of belief. Under this account, previous observations exert an inductive influence on the next observation. I show how this approach can be used to capture the JohnsonCarnap continuum of inductive methods, as well as the NixParis continuum, and show how inductive influence can
Objective Bayesianism with predicate languages. Synthese
, 2008
"... Objective Bayesian probability is often defined over rather simple domains, e.g., finite event spaces or propositional languages. This paper investigates the extension of objective Bayesianism to firstorder logical languages. It is argued that the objective Bayesian should choose a probability func ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Objective Bayesian probability is often defined over rather simple domains, e.g., finite event spaces or propositional languages. This paper investigates the extension of objective Bayesianism to firstorder logical languages. It is argued that the objective Bayesian should choose a probability function, from all those that satisfy constraints imposed by background knowledge, that is closest to a particular frequencyinduced probability function which generalises the λ = 0 function of Carnap’s continuum of inductive methods.
A note on binary inductive logic
 JOURNAL OF PHILOSOPHICAL LOGIC
, 2007
"... We consider the problem of induction over languages containing binary relations and outline a way of interpreting and constructing a class of probability functions on the sentences of such a language. Some principles of inductive reasoning satisfied by these probability functions are discussed, lead ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We consider the problem of induction over languages containing binary relations and outline a way of interpreting and constructing a class of probability functions on the sentences of such a language. Some principles of inductive reasoning satisfied by these probability functions are discussed, leading in turn to a representation theorem for a more general class of probability functions satisfying these principles.
Improved smoothing for probabilistic suffix trees seen as variable order Markov chains
 IN EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML
, 2002
"... In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We sh ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We show that the model based on enhanced smoothing outperform the PST while needing less parameters on a protein domain detection task on public databases.
Grammar induction from text using small syntactic prototypes
 In Proceedings of 5th International Joint Conference on Natural Language Processing
, 2011
"... We present an efficient technique to incorporate a small number of crosslinguistic parameter settings defining default word orders to otherwise unsupervised grammar induction. A syntactic prototype, represented by the integrated model between Categorial Grammar and dependency structure, generated f ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present an efficient technique to incorporate a small number of crosslinguistic parameter settings defining default word orders to otherwise unsupervised grammar induction. A syntactic prototype, represented by the integrated model between Categorial Grammar and dependency structure, generated from the language parameters, is used to prune the search space. We also propose heuristics which prefer less complex syntactic categories to more complex ones in parse decoding. The system reduces errors generated by the stateoftheart baselines for WSJ10 (1 % error reduction of F1 score for the model trained on Sections 2–22 and tested on Section 23), Chinese10 (26 % error reduction of F1), German10 (9 % error reduction of F1), and Japanese10 (8% error reduction of F1), and is not significantly different from the baseline for Czech10. 1