Dynamic topic models
 In ICML
, 2006
Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly
ModelBased Clustering and Data Transformations for Gene Expression Data
, 2001
Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, modelbased clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications.
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
 In Proceedings of NAACLHLT 2009. Shay
, 2009
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a nonparallel, multilingual corpus. 1
A Statistical Model for Multiparty Electoral Data
 American Political Science Review
, 1999
e propose a comprehensive statistical model for analyzing multiparty, districtlevel elections. This model, which provides a tool for comparative politics research analogous to that which regression analysis provides in the American twoparty context, can be used to explain or predict how geographic distributions of electoral results depend upon economic conditions, neighborhood ethnic compositions, campaign spending, and other features of the election campaign or aggregate areas. We also provide new graphical representations for data exploration, model evaluation, and substantive interpretation. We illustrate the use of this model by attempting to resolve a controversy over the size of and trend in the electoral advantage of incumbency in Britain. Contraiy to previous analyses, all based on measures now known to be biased, we demonstrate that the advantage is small but meaningfkl, varies substantially across the parties, and is not growing. Finally, we show how to estimate the party from which each party's advantage is predominantly drawn. w e propose the first internally consistent statistical model for analyzing multiparty, districtlevel aggregate election data. Our model can
Bayesian wavelet regression on curves with application to a spectroscopic calibration problem
 Journal of the American Statistical Association
, 2001
Motivated by calibration problems in nearinfrared (N IR) spectroscopy, we consider the linear regression setting in which the many predictor variables arise from sampling an essentially continuous curve at equally spaced points and there may be multiple predictands. We tackle this regression problem by calculating the wavelet transforms of the discretized curves, then applying a Bayesian variable selection method using mixture priors to the multivariate regression of predictands on wavelet coef � cients. For prediction purposes, we average over a set of likely models. Applied to a particular problem in N IR spectroscopy, this approach was able to � nd subsets of the wavelet coef � cients with overall better predictive performance than the more usual approaches. In the application, the available predictors are measurements of the N IR re � ectance spectrum of biscuit dough pieces at 256 equally spaced wavelengths. The aim is to predict the composition (i.e., the fat, � our, sugar, and water content) of the dough pieces using the spectral variables. Thus we have a multivariate regression of four predictands on 256 predictors with quite high intercorrelation among the predictors. A training set of 39 samples is available to � t this regression. Applying a wavelet transform replaces the 256 measurements on each spectrum with 256 wavelet coef � cients that carry the same information. The variable selection method could use subsets of these coef � cients that gave good predictions for all four compositional variables on a separate test set of samples. Selecting in the wavelet domain rather than from the original spectral variables is appealing in this application, because a single wavelet coef � cient can carry information from a band of wavelengths in the original spectrum. This band can be narrow or wide, depending on the scale of the wavelet selected.
Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1997
 J. Neurosci
, 1999
Sexrelated differences in behavior are extensive, but their neuroanatomic substrate is unclear. Indirect perfusion data have suggested a higher percentage of gray matter (GM) in left hemisphere cortex and in women, but differences in volumes of the major cranial compartments have not been examined for the entire brain in association with cognitive performance. We used volumetric segmentation of dual echo (proton density and T2weighted) magnetic resonance imaging (MRI) scans in healthy volunteers (40 men, 40 women) age 18–45. Supertentorial volume was segmented into GM, white matter (WM), and CSF. We confirmed that women have a higher percentage of GM, whereas men have a higher percentage of WM and of CSF. These differences sustained a correction for total intracranial volume. In men the slope of the relation between cranial volume and GM paralleled that for WM, whereas in women the increase
Propagating Imprecise Probabilities In Bayesian Networks
 Artificial Intelligence
, 1996
Often experts are incapable of providing `exact' probabilities; likewise, samples on which the probabilities in networks are based must often be small and preliminary. In such cases the probabilities in the networks are imprecise. The imprecision can be handled by secondorder probability distributions. It is convenient to use beta or Dirichlet distributions to express the uncertainty about probabilities. The problem of how to propagate point probabilities in a Bayesian network now is transformed into the problem of how to propagate Dirichlet distributions in Bayesian networks. It is shown that the propagation of Dirichlet distributions in Bayesian networks with incomplete data results in a system of probability mixtures of betabinomial and Dirichlet distributions. Approximate first order probabilities and their second order probability density functions are be obtained by stochastic simulation. A number of properties of the propagation of imprecise probabilities are discuss...
Markov Topic Models
We develop Markov topic models (MTMs), a novel family of generative probabilistic models that can learn topics simultaneously from multiple corpora, such as papers from different conferences. We apply Gaussian (Markov) random fields to model the correlations of different corpora. MTMs capture both the internal topic structure within each corpus and the relationships between topics across the corpora. We derive an efficient estimation procedure with variational expectationmaximization. We study the performance of our models on a corpus of abstracts from six different computer science conferences. Our analysis reveals qualitative discoveries that are not possible with traditional topic models, and improved quantitative performance over the state of the art. 1
Likelihood and nonparametric Bayesian MCMC inference for spatial point processes based on perfect simulation and path sampling
 Scand. J. Statist
, 2003
We consider the combination of path sampling and perfect simulation in the context of both likelihood inference and nonparametric Bayesian inference for pairwise interaction point processes. Several empirical results based on simulations and analysis of a dataset are presented, and the merits of using perfect simulation are discussed.