Results 1  10
of
23
Kernel stickbreaking processes
, 2007
"... Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed ..."
Abstract

Cited by 39 (11 self)
 Add to MetaCart
Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed random weights are assigned to each location. Predictordependent random probability measures are then constructed by mixing over the locations, with stickbreaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariatedependent prediction rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.
Nonparametric bayes pachinko allocation
 In UAI
, 2007
"... Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides mor ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides more flexibility and greater expressive power than previous models like latent Dirichlet allocation (LDA), it is also more difficult to determine the appropriate topic structure for a specific dataset. In this paper, we propose a nonparametric Bayesian prior for PAM based on a variant of the hierarchical Dirichlet process (HDP). Although the HDP can capture topic correlations defined by nested data structure, it does not automatically discover such correlations from unstructured data. By assuming an HDPbased prior for PAM, we are able to learn both the number of topics and how the topics are correlated. We evaluate our model on synthetic and realworld text datasets, and show that nonparametric PAM achieves performance matching the best of PAM without manually tuning the number of topics. 1
The Matrix StickBreaking Process for Flexible MultiTask Learning
"... In multitask learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does not allow local clustering of tasks with respe ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
In multitask learning our goal is to design regression or classification models for each of the tasks and appropriately share information between tasks. A Dirichlet process (DP) prior can be used to encourage task clustering. However, the DP prior does not allow local clustering of tasks with respect to a subset of the feature vector without making independence assumptions. Motivated by this problem, we develop a new multitasklearning prior, termed the matrix stickbreaking process (MSBP), which encourages crosstask sharing of data. However, the MSBP allows separate clustering and borrowing of information for the different feature components. This is important when tasks are more closely related for certain features than for others. Bayesian inference proceeds by a Gibbs sampling algorithm and the approach is illustrated using a simulated example and a multinational application. 1.
Penalized modelbased clustering with application to variable selection
 Journal of Machine Learning Research
, 2007
"... Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size ” settings, we propose a penalized likelihood approach with an L1 ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size ” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.
Nonparametric bayes conditional distribution modeling with variable selection
 Journal of the American Statistical Association
, 2009
"... This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for t ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for the response distribution change both with local regions and globally. We first introduce the probit stickbreaking process (PSBP) as a prior for an uncountable collection of predictordependent random probability measures and propose a PSBP mixture (PSBPM) of normal regressions for modeling the conditional distributions. A global variable selection structure is incorporated to discard unimportant predictors, while allowing estimation of posterior inclusion probabilities. Local variable selection is conducted relying on the conditional distribution estimates at different predictor points. An efficient stochastic search sampling algorithm is proposed for posterior computation. The methods are illustrated through simulation and applied to an epidemiologic study.
Nonparametric Bayes applications to biostatistics,” Bayesian Nonparametrics: Principles and Practice
 In
, 2010
"... Biomedical research has clearly evolved at a dramatic rate in the past decade, with improvements in technology leading to a fundamental shift in the way in which data are collected and analyzed. Before this paradigm shift, studies were most commonly designed to be simple and to focus on relationship ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Biomedical research has clearly evolved at a dramatic rate in the past decade, with improvements in technology leading to a fundamental shift in the way in which data are collected and analyzed. Before this paradigm shift, studies were most commonly designed to be simple and to focus on relationships among a few variables of primary interest. For example, in
The Local Dirichlet Process
"... As a generalization of the Dirichlet process to allow predictor dependence, we propose a local Dirichlet process (lDP). The lDP provides a prior distribution for a collection of random probability measures indexed by predictors. This is accomplished by assigning stickbreaking weights and atoms to ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
As a generalization of the Dirichlet process to allow predictor dependence, we propose a local Dirichlet process (lDP). The lDP provides a prior distribution for a collection of random probability measures indexed by predictors. This is accomplished by assigning stickbreaking weights and atoms to random locations in a predictor space. The probability measure at a given predictor value is then formulated using the weights and atoms located in a neighborhood about that predictor value. This construction results in a marginal Dirichlet process prior for the random measure at any specific predictor value. Dependence is induced through local sharing of random components. Theoretical properties are considered and a blocked Gibbs sampler is proposed for posterior computation in lDP mixture models. The methods are illustrated using simulated examples and an epidemiologic application.
Graph Mining with Variational Dirichlet Process Mixture Models
, 2008
"... Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible su ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose a nonparametric Bayesian method for clustering graphs and selecting salient patterns at the same time. Variational inference is adopted here, because sampling is not applicable due to extremely high dimensionality. The feature set minimizing the free energy is efficiently collected with the DFS code tree, where the generation of useless subgraphs is suppressed by a tree pruning condition. In experiments, our method is compared with a simpler approach based on frequent subgraph mining, and graph kernels.
Pachinko allocation: Scalable mixture models of topic correlations
 of Machine Learning Research. Submitted
, 2008
"... Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations between topics. In this paper, we propose the pachinko allocation model (PAM), which captures arbitrary t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations between topics. In this paper, we propose the pachinko allocation model (PAM), which captures arbitrary topic correlations using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). As we have observed, topic correlations are usually sparse. By taking advantage of this property, we develop a highlyscalable inference algorithm for PAM. In our experiments, we show improved performance of PAM in document classification, likelihood of heldout data, topical keyword coherence, and the ability to support a great number of finegrained topics in very large datasets. Keywords: pachinko allocation, topic models, Gibbs sampling