Results 11  20
of
143
Productivity and Reuse in Language
, 2011
"... We present a Bayesian model of the mirror image problems of linguistic productivity and reuse. The model, known as Fragment Grammar, is evaluated against several morphological datasets; its performance is compared to competing theoretical accounts including full–parsing, full–listing, and exemplar–b ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
We present a Bayesian model of the mirror image problems of linguistic productivity and reuse. The model, known as Fragment Grammar, is evaluated against several morphological datasets; its performance is compared to competing theoretical accounts including full–parsing, full–listing, and exemplar–based models. The model is able to learn the correct patterns of productivity and reuse for two very different systems: the English past tense which is characterized by a sharp dichotomy in productivity between regular and irregular forms and English derivational morphology which is characterized by a graded cline from very productive (ness) to very unproductive (th). Keywords:Productivity;Reuse;Storage;Computation; Bayesian Model;Past Tense;Derivational Morphology
Lossless compression based on the Sequence Memoizer
 In Data Compression Conference 2010
, 2010
"... In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling of longrange dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1
Training continuous space language models: some practical issues
"... Using multilayer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally ch ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Using multilayer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally challenging which does not scale easily to the huge corpora that are nowadays available. In this work, we study the performance and behavior of two neural statistical language models so as to highlight some important caveats of the classical training algorithms. The induced word embeddings for extreme cases are also analysed, thus providing insight into the convergence issues. A new initialization scheme and new training techniques are then introduced. These methods are shown to greatly reduce the training time and to significantly improve performance, both in terms of perplexity and on a largescale translation task. 1
The OrnsteinUhlenbeck Dirichlet process and other timevarying processes for Bayesian nonparametric inference
 Journal of Statistical Planning and Inference
, 2011
"... timevarying processes for Bayesian nonparametric inference ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
timevarying processes for Bayesian nonparametric inference
Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering
"... In this work, we apply Dirichlet Process Mixture Models (DPMMs) to a learning task in natural language processing (NLP): lexicalsemantic verb clustering. We thoroughly evaluate a method of guiding DPMMs towards a particular clustering solution using pairwise constraints. The quantitative and quali ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
In this work, we apply Dirichlet Process Mixture Models (DPMMs) to a learning task in natural language processing (NLP): lexicalsemantic verb clustering. We thoroughly evaluate a method of guiding DPMMs towards a particular clustering solution using pairwise constraints. The quantitative and qualitative evaluation performed highlights the benefits of both standard and constrained DPMMs compared to previously used approaches. In addition, it sheds light on the use of evaluation measures and their practical application. 1
Continuous Space Translation Models with Neural Networks
"... The use of conventional maximum likelihood estimates hinders the performance of existing phrasebased translation models. For lack of sufficient training data, most models only consider a small amount of context. As a partial remedy, we explore here several continuous space translation models, where ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
The use of conventional maximum likelihood estimates hinders the performance of existing phrasebased translation models. For lack of sufficient training data, most models only consider a small amount of context. As a partial remedy, we explore here several continuous space translation models, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. In order to handle a large set of translation units, these representations and the associated estimates are jointly computed using a multilayer neural network with a SOUL architecture. In small scale and large scale English to French experiments, we show that the resulting models can effectively be trained and used on top of a ngram translation system, delivering significant improvements in performance. 1
S.: Streambased joint explorationexploitation active learning
 In: CVPR (2012
"... 1 st table 2 nd table k th table New table n1 n + ↵ 5 ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
1 st table 2 nd table k th table New table n1 n + ↵ 5
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation
"... In this paper we present a doubly hierarchical PitmanYor process language model. Its bottom layer of hierarchy consists of multiple hierarchical PitmanYor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multi ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present a doubly hierarchical PitmanYor process language model. Its bottom layer of hierarchy consists of multiple hierarchical PitmanYor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multiple language models such that they share statistical strength. Intuitively this sharing results in the “adaptation ” of a latent shared language model to each domain. We introduce a general formalism capable of describing the overall model which we call the graphical PitmanYor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adaptation results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry. 1
Characterizations of exchangeable partitions and random discrete distributions by deletion properties
, 2009
"... We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of the random partition πn of these individuals into various types, and all individuals of the same type as the chosen individual are deleted, then for each r> 0, given that r individuals remain, these individuals are partitioned according to π ′ r for some sequence of random partitions (π ′ r) that does not depend on n or r. An analogous result characterizes the associated PoissonDirichlet family of random discrete distributions by an independence property related to random deletion of a frequency chosen by a sizebiased pick. We also survey the regenerative properties of members of the twoparameter family, and settle a question regarding the explicit arrangement of intervals with lengths given by the terms of the PoissonDirichlet random sequence into the interval partition induced by the range of a neutraltothe right process.
Unsupervised deduplication using crossfield dependencies
 in KDD, Las Vegas
, 2008
"... Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue str ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent—because venues tend to focus on a few research areas—but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a crossfield dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models crossfield dependence. Our model uses a single set of latent variables to control two disparate clustering models: a Dirichletmultinomial model over titles, and a nonexchangeable stringedit model over venues. We show that modeling crossfield dependence yields a substantial improvement in performance—a 58 % reduction in error over a standard Dirichlet process mixture.