## Transformation Process Priors

### Cached

### Download Links

### BibTeX

@MISC{Andrews_transformationprocess,

author = {Nicholas Andrews},

title = {Transformation Process Priors},

year = {}

}

### OpenURL

### Abstract

### Citations

777 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ...hns Hopkins University 3400 N. Charles St., Baltimore, MD 21218 USA {noa,eisner}@jhu.edu Key idea. As a prior over discrete distributions over X , it is common to use a Dirichlet or Dirichlet process =-=[9]-=-. However, because of the neutrality property, these priors cannot capture correlations among the probabilities of “similar” events. We propose obtaining the discrete distribution from a random walk m... |

502 | Syntactic Structures - Chomsky - 1957 |

92 |
Nouns in WordNet: a lexical inheritance system
- Miller
- 1993
(Show Context)
Citation Context ...on process with a log-linear parameterization of δ [7], a nonparametric version would allow us to take a proper Bayesian approach to the infinite set of parameters. Word sense disambiguation. WordNet =-=[13]-=- is a directed acyclic graph whose leaves are words and whose internal nodes are meanings. A path through the graph successively specializes a meaning until it arrives at a word: ENTITY → PHYSICAL ENT... |

66 | A realistic transformation grammar - Bresnan - 1978 |

36 | A topic model for word sense disambiguation
- Boyd-Graber, Blei, et al.
- 2007
(Show Context)
Citation Context ...rits from multiple parents. Suppose we wish to use WordNet to help estimate a joint distribution P (context, word). We may use a transformation process whose structure is drawn from the WordNet graph =-=[5, 2]-=-—or more precisely, a separate copy of that graph for each context c. A hierarchical prior allows us to learn which sub-concepts of (e.g.) PHYSICAL ENTITY are common overall, and which are unusually p... |

32 |
Density modeling and clustering using dirichlet diffusion trees
- Neal
(Show Context)
Citation Context ...ent events, there are many applications where the inferred sequences of transformations have a meaningful interpretation (examples below). Relation to prior work. Similar to Dirichlet diffusion trees =-=[14]-=- (or more generally Pitman-Yor diffusion trees [12]), transformation processes model correlations via latent structure. (A certain nonparametric transformation process prior on X = R gives a discrete ... |

32 |
Sampling the Dirichlet mixture model with slices
- Walker
- 2007
(Show Context)
Citation Context ...ibutions. (We may also optimize hyperparameters.) Inference via MCMC sampling is possible using a slice sampler that works directly with stick-breaking representations rather than collapsing them out =-=[16]-=-. To sample latent paths from the posterior, auxiliary variables are also sampled at each vertex. Conditioned on these auxiliary variables, the sampler state consists of a finite portion of the graph,... |

21 |
Population genetics theory - the past and the future
- Ewens
- 1988
(Show Context)
Citation Context ...utions, and the transformation probabilities are therefore given by δ(x ′ ∞∑ |x) = pj1(yj = x ′ ) (2) j=1 where yj are drawn IID from a (usually sparse) base distribution δ0(x ′ |x), and pj ∼ GEM(αx) =-=[8]-=-. The base distributions may themselves be drawn from a hierarchical prior or share parameters. Posterior inference. Given a dataset of observations x ∼ P , posterior inference consists of imputing th... |

17 | Latent-variable modeling of string transductions with finite-state methods - Dreyer, Smith, et al. - 2008 |

13 | The discrete infinite logistic normal distribution for mixed-membership modeling
- Paisley, Wang, et al.
(Show Context)
Citation Context ...he Gaussian process, such as the discrete infinite logistic normal distribution, model correlations explicitly via the parameters of a covariance function, but do not reconstruct any latent structure =-=[15]-=-. The transformation process is especially appropriate as a prior for discrete distributions where latent structure plausibly exists in the form of derivational history. For instance, a suitable trans... |

11 |
Smoothing a Probabilistic Lexicon Via Syntactic Transformations
- Eisner
- 2001
(Show Context)
Citation Context ...trality property, these priors cannot capture correlations among the probabilities of “similar” events. We propose obtaining the discrete distribution from a random walk model or transformation model =-=[7]-=-, in which each observed event has evolved via a latent sequence of transformations. The transformation model is specified by a collection δ of conditional distributions (the transformations), so plac... |

9 | A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation - Wood, Teh - 2009 |

8 | Unsupervised Deduplication using Cross-field Dependencies
- Hall, Sutton, et al.
- 2008
(Show Context)
Citation Context ...; alternatively, the flow may be approximated using a relaxation algorithm [7]. 3 Examples of Transformation Processes String variation. Strings, such as genetic sequences [17], bibliographic entries =-=[10]-=-, and proper names, may undergo mutation when they are copied. Given a collection of strings, we hope to infer their evolutionary history (and thereby cluster strings with a common ancestor), by fitti... |

6 | The Gaussian process density sampler
- Adams, Murray, et al.
(Show Context)
Citation Context ...isfy δ(x|x ′ ) > 0, where the coefficients are determined by the conditional probabilities δ(x|x ′ ), δ(□ | x), and δ(□ | x ′ ). This is similar to the way that in Gaussian process density estimation =-=[1]-=-, with covariance matrix Σ, each log P(x) is a linear combination of the log P(x ′) values for x ′ that satisfy (Σ−1)xx′ ̸= 0.1 1 Although Σ −1 is symmetric, unlike our conditional probabilities, it i... |

4 |
sampling methods for stick breaking priors
- Gibbs
- 2001
(Show Context)
Citation Context ...rt the Gaussian process model of log P(x) to a Bayesian network in which each log P(x) is a linear combination of the log P(x ′ ) values for the parents x ′ of x. x ′ 1We place stick-breaking priors =-=[11]-=- over the transition distributions, and the transformation probabilities are therefore given by δ(x ′ ∞∑ |x) = pj1(yj = x ′ ) (2) j=1 where yj are drawn IID from a (usually sparse) base distribution δ... |

3 |
and Zoubin Ghahramani. Pitman-Yor diffusion trees
- Knowles
(Show Context)
Citation Context ...nferred sequences of transformations have a meaningful interpretation (examples below). Relation to prior work. Similar to Dirichlet diffusion trees [14] (or more generally Pitman-Yor diffusion trees =-=[12]-=-), transformation processes model correlations via latent structure. (A certain nonparametric transformation process prior on X = R gives a discrete analogue of Dirichlet diffusion trees.) Methods bas... |

1 | Finite-state Dirichlet allocation: Learned priors on finite-state models
- Cui, Eisner
- 2006
(Show Context)
Citation Context ...rits from multiple parents. Suppose we wish to use WordNet to help estimate a joint distribution P (context, word). We may use a transformation process whose structure is drawn from the WordNet graph =-=[5, 2]-=-—or more precisely, a separate copy of that graph for each context c. A hierarchical prior allows us to learn which sub-concepts of (e.g.) PHYSICAL ENTITY are common overall, and which are unusually p... |

1 |
Nebojsa Jojic. Nonparametric Combinatorial Sequence Models
- Wauthier, Jordan
- 2011
(Show Context)
Citation Context ...a finite system of equations; alternatively, the flow may be approximated using a relaxation algorithm [7]. 3 Examples of Transformation Processes String variation. Strings, such as genetic sequences =-=[17]-=-, bibliographic entries [10], and proper names, may undergo mutation when they are copied. Given a collection of strings, we hope to infer their evolutionary history (and thereby cluster strings with ... |