Results 21 - 30
of
45
Dependencies
"... Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue str ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent—because venues tend to focus on a few research areas—but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set of latent variables to control two disparate clustering models: a Dirichlet-multinomial model over titles, and a non-exchangeable string-edit model over venues. We show that modeling cross-field dependence yields a substantial improvement in performance—a 58 % reduction in error over a standard Dirichlet process mixture.
Shared Segmentation of Natural Scenes UsingDependentPitman-Yor Processes
"... We develop a statisticalframework forthe simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segmentsizesbothfollowpowerlawdistributions,whicharewellmodeledbyth ..."
Abstract
- Add to MetaCart
We develop a statisticalframework forthe simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segmentsizesbothfollowpowerlawdistributions,whicharewellmodeledbythe Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previousapplicationsofPYprocesses,weuseGaussianprocessestodiscoverspatially contiguous segments which respect image boundaries. Using a novel family of variationalapproximations,ourapproachproducessegmentationswhichcompare favorablytostate-of-the-artmethods,whilesimultaneouslydiscoveringcategories shared among natural scenes. 1
U N I V E R S
"... The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-sup ..."
Abstract
- Add to MetaCart
The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-supervised learning methods and to test their applicability and effectiveness on this class of language models. The original methods have to be adapted as the usual semi-supervised inference with Expectation-Maximization (EM) is not applicable. The alternative is to retreat to Gibbs sampling. The two major contributions are novel variants of the Stable Mixing method (Corduneanu and Jaakkola, 2002) and of Active Learning (Cohn et al., 1996). Unlike their originals, the new methods are applicable to Gibbs-based inference methods. For an exemplary word separation task on the Bernstein Ratner corpus, the new Stable Mixing variant improves the word F0-score by 9 % in comparison to the base-line approach. 2 Acknowledgements I want to express my gratitute to my project supervisor, Miles Osborne, for his support
LEVERAGING STRUCTURAL INFORMATION FOR STATISTICAL TOPIC MODELS OF TEXT
, 2009
"... Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author The author reserves other publication rights, and neither the thesis nor extensi ..."
Abstract
- Add to MetaCart
Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. Signature of Author The author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission. The author attests that permission has been obtained for the use of any copyrighted material appearing in the thesis (other than brief excerpts requiring only proper acknowledgement in scholarly writing) and that all such use is clearly acknowledged. iii I dedicate this to my family, Zahra and Taha for their love, help, and patience
INTERSPEECH 2010 Learning a Language Model from Continuous Speech
"... This paper presents a new approach to language model construction, learning a language model not from text, but directly from continuous speech. A phoneme lattice is created using acoustic model scores, and Bayesian techniques are used to robustly learn a language model from this noisy input. A nove ..."
Abstract
- Add to MetaCart
This paper presents a new approach to language model construction, learning a language model not from text, but directly from continuous speech. A phoneme lattice is created using acoustic model scores, and Bayesian techniques are used to robustly learn a language model from this noisy input. A novel sampling technique is devised that allows for the integrated learning of word boundaries and an n-gram language model with no prior linguistic knowledge. The proposed techniques were used to learn a language model directly from continuous, potentially large-vocabulary speech. This language model was able to significantly reduce the ASR phoneme error rate over a separate set of test data, and the proposed lattice processing and lexical acquisition techniques were found to be important factors in this improvement. Index Terms: language acquisition, word segmentation, Pitman-Yor language model, Bayesian learning
Hierarchical Pitman-Yor Language Model for Information Retrieval
"... In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which is one of the statistical properties of word frequ ..."
Abstract
- Add to MetaCart
In this paper, we propose a new application of Bayesian language model based on Pitman-Yor process for information retrieval. This model is a generalization of the Dirichlet distribution. The Pitman-Yor process creates a power-law distribution which is one of the statistical properties of word frequency in natural language. Our experiments on Robust04 indicate that this model improves the document retrieval performance compared to the commonly used Dirichlet prior and absolute discounting smoothing techniques.
Separating Precision and Mean in Dirichlet-enhanced High-order Markov Models
"... Abstract. Robustly estimating the state-transition probabilities of highorder Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), w ..."
Abstract
- Add to MetaCart
Abstract. Robustly estimating the state-transition probabilities of highorder Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), where Dirichlet distributions are hierarchically assumed to be the prior distributions of the state-transition probabilities. The key idea in HSDS is to separate the parameters of a Dirichlet distribution into the precision and mean, so that the precision depends on the context while the mean is given by the lower-order distribution. HSDS is designed to outperform Kneser-Ney smoothing especially when the number of states is small, where Kneser-Ney smoothing is currently known as the state-of-the-art technique for N-gram natural language models. Our experiments in protein sequence modeling showed the superiority of HSDS both in perplexity evaluation and classification tasks. 1
Probabilistic Deterministic Infinite Automata
"... We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference ..."
Abstract
- Add to MetaCart
We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference in this model, given a finite training sequence, can be interpreted as averaging over multiple PDFAs of varying structure, where each PDFA is biased towards having few states. We suggest that our method for averaging over PDFAs is a novel approach to predictive distribution smoothing. We test PDIA inference both on PDFA structure learning and on both natural language and DNA data prediction tasks. The results suggest that the PDIA presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothed Markov models. 1
Gatsby Computational
"... Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only ..."
Abstract
- Add to MetaCart
Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibit power-law properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and power-law characteristics while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated. 1.

