Results 1  10
of
42
Stochastic Variational Inference
 JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract

Cited by 131 (27 self)
 Add to MetaCart
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
Asymptotically exact, embarrassingly parallel MCMC
 In 30th Conference on Uncertainty in Artificial Intelligence, UAI ’14
, 2014
"... Abstract Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Abstract Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burnin and sampling in several models.
Learning from Crowds in the Presence of Schools of Thought
, 2012
"... Crowdsourcing has recently become popular among machine learning researchers and social scientists as an effective way to collect largescale experimental data from distributed workers. To extract useful information from the cheap but potentially unreliable answers to tasks, a key problem is to iden ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Crowdsourcing has recently become popular among machine learning researchers and social scientists as an effective way to collect largescale experimental data from distributed workers. To extract useful information from the cheap but potentially unreliable answers to tasks, a key problem is to identify reliable workers as well as unambiguous tasks. Although for objective tasks that have one correct answer per task, previous works can estimate worker reliability and task clarity based on the single gold standard assumption, for tasks that are subjective and accept multiple reasonable answers that workers may be grouped into, a phenomenon called schools of thought, existing models cannot be trivially applied. In this work, we present a statistical model to estimate worker reliability and task clarity without resorting to the single gold standard assumption. This is instantiated by explicitly characterizing the grouping behavior to form schools of thought with a rank1 factorization of a workertask groupsize matrix. Instead of performing an intermediate inference step, which can be expensive and unstable, we present an algorithm to analytically compute the sizes of different groups. We perform extensive empirical studies on real data collected from Amazon Mechanical Turk. Our method discovers the schools of thought, shows reasonable estimation of worker reliability and task clarity, and is robust to hyperparameter changes. Furthermore, our estimated worker reliability can be used to improve the gold standard prediction for objective tasks.
Cognitive control over learning: Creating, clustering and generalizing taskset structure
 Psychological Review
, 2013
"... Learning and executive functions such as taskswitching share common neural substrates, notably prefrontal cortex and basal ganglia. Understanding how they interact requires studying how cognitive control facilitates learning but also how learning provides the (potentially hidden) structure, such as ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Learning and executive functions such as taskswitching share common neural substrates, notably prefrontal cortex and basal ganglia. Understanding how they interact requires studying how cognitive control facilitates learning but also how learning provides the (potentially hidden) structure, such as abstract rules or tasksets, needed for cognitive control. We investigate this question from 3 complementary angles. First, we develop a new contexttaskset (CTS) model, inspired by nonparametric Bayesian methods, specifying how the learner might infer hidden structure (hierarchical rules) and decide to reuse or create new structure in novel situations. Second, we develop a neurobiologically explicit network model to assess mechanisms of such structured learning in hierarchical frontal cortex and basal ganglia circuits. We systematically explore the link between these modeling levels across task demands. We find that the network provides an approximate implementation of highlevel CTS computations, with specific neural mechanisms modulating distinct CTS parameters. Third, this synergism yields predictions about the nature of human optimal and suboptimal choices and response times during learning and taskswitching. In particular, the models suggest that participants spontaneously build taskset structure into a learning problem when not cued to do so, which predicts positive and negative transfer in subsequent generalization tests. We provide experimental evidence for these predictions and show that CTS provides a good quantitative fit to human sequences of choices. These findings implicate a strong tendency to interactively engage cognitive control and learning, resulting in structured abstract representations that afford generalization opportunities and, thus, potentially longterm rather than shortterm optimality.
Deep unsupervised learning using nonequilibrium thermodynamics
 In ICML 32
, 2015
"... A central problem in machine learning involves modeling complex datasets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieve ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
A central problem in machine learning involves modeling complex datasets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by nonequilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm. 1.
Mining Idioms from Source Code
"... We present the first method for automatically mining code idioms from a corpus of previously written, idiomatic software projects. We take the view that a code idiom is a syntactic fragment that recurs across projects and has a single semantic purpose. Idioms may have metavariables, such as the body ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
We present the first method for automatically mining code idioms from a corpus of previously written, idiomatic software projects. We take the view that a code idiom is a syntactic fragment that recurs across projects and has a single semantic purpose. Idioms may have metavariables, such as the body of a for loop. Modern IDEs commonly provide facilities for manually defining idioms and inserting them on demand, but this does not help programmers to write idiomatic code in languages or using libraries with which they are unfamiliar. We present Haggis, a system for mining code idioms that builds on recent advanced techniques from statistical natural language processing, namely, nonparametric Bayesian probabilistic tree substitution grammars. We apply Haggis to several of the most popular open source projects from GitHub. We present a wide range of evidence that the resulting idioms are semantically meaningful, demonstrating that they do indeed recur across software projects and that they occur more frequently in illustrative code examples collected from a Q&A site. Manual examination of the most common idioms indicate that they describe important program concepts, including object creation, exception handling, and resource management.
Who to Follow and Why: Link Prediction with Explanations
"... User recommender systems are a key component in any online social networking platform: they help the users growing their network faster, thus driving engagement and loyalty. In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we pro ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
User recommender systems are a key component in any online social networking platform: they help the users growing their network faster, thus driving engagement and loyalty. In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we propose WTFW (“Who to Follow and Why”), a stochastic topic model for link prediction over directed and nodesattributed graphs. Our model not only predicts links, but for each predicted link it decides whether it is a“topical” or a “social ” link, and depending on this decision it produces a di↵erent type of explanation. A topical link is recommended between a user interested in a topic and a user authoritative in that topic: the explanation in this case is a set of binary features describing the topic responsible of the link creation. A social link is recommended between users which share a large social neighborhood: in this case the explanation is the set of neighbors which are more likely to be responsible for the link creation. Our experimental assessment on realworld data confirms the accuracy of WTFW in the link prediction and the quality of the associated explanations.
CENTRAL LIMIT THEOREMS FOR AN INDIAN BUFFET MODEL WITH RANDOM WEIGHTS
"... Abstract. The threeparameter Indian buffet process is generalized. The possibly different role played by customers is taken into account by suitable (random) weights. Various limit theorems are also proved for such generalized Indian buffet process. Let Ln be the number of dishes experimented by th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The threeparameter Indian buffet process is generalized. The possibly different role played by customers is taken into account by suitable (random) weights. Various limit theorems are also proved for such generalized Indian buffet process. Let Ln be the number of dishes experimented by the first n customers, and let Kn = (1/n) ∑n i=1 Ki where Ki is the number of dishes tried by customer i. The asymptotic distributions of Ln and Kn, suitably centered and scaled, are obtained. The convergence turns out to be stable (and not only in distribution). As a particular case, the results apply to the standard (i.e., non generalized) Indian buffet process. 1.
Structure and Flexibility in Bayesian Models of Cognition
"... Probability theory forms a natural framework for explaining the impressive success of people at solving many difficult inductive problems, such as learning words and categories, inferring the relevant features of objects, and identifying functional relationships. Probabilistic models of cognition u ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Probability theory forms a natural framework for explaining the impressive success of people at solving many difficult inductive problems, such as learning words and categories, inferring the relevant features of objects, and identifying functional relationships. Probabilistic models of cognition use Bayes ’ rule to identify probable structures or representations that could have generated a set of observations, whether the observations are sensory input or the output of other psychological processes. In this chapter we address an important question that arises within this framework: How do people infer representations that are complex enough to faithfully encode the world but not so complex that they “overfit” noise in the data? We discuss nonparametric Bayesian models as a potential answer to this question. To do so, first we present the mathematical background necessary to understand nonparametric Bayesian models. We then delve into nonparametric Bayesian models for three types of hidden structure: clusters, features, and functions. Finally, we conclude with a summary and discussion of open questions for future research.
Automatically Determining a Proper Length for Multidocument Summarization: A Bayesian Nonparametric Approach
"... Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper su ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction ” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination. 1