Results 1  10
of
96
The nested chinese restaurant process and bayesian inference of topic hierarchies
, 2007
"... We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Spe ..."
Abstract

Cited by 100 (12 self)
 Add to MetaCart
(Show Context)
We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
The CrossEntropy Method for Combinatorial and Continuous Optimization
, 1999
"... We present a new and fast method, called the crossentropy method, for finding the optimal solution of combinatorial and continuous nonconvex optimization problems with convex bounded domains. To find the optimal solution we solve a sequence of simple auxiliary smooth optimization problems based on ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
We present a new and fast method, called the crossentropy method, for finding the optimal solution of combinatorial and continuous nonconvex optimization problems with convex bounded domains. To find the optimal solution we solve a sequence of simple auxiliary smooth optimization problems based on KullbackLeibler crossentropy, importance sampling, Markov chain and Boltzmann distribution. We use importance sampling as an important ingredient for adaptive adjustment of the temperature in the Boltzmann distribution and use KullbackLeibler crossentropy to find the optimal solution. In fact, we use the mode of a unimodal importance sampling distribution, like the mode of beta distribution, as an estimate of the optimal solution for continuous optimization and Markov chains approach for combinatorial optimization. In the later case we show almost surely convergence of our algorithm to the optimal solution. Supporting numerical results for both continuous and combinatorial optimization problems are given as well. Our empirical studies suggest that the crossentropy method has polynomial in the size of the problem running time complexity.
BALANCED ALLOCATIONS: THE HEAVILY LOADED CASE
, 2006
"... We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selec ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We investigate ballsintobins processes allocating m balls into n bins based on the multiplechoice paradigm. In the classical singlechoice variant each ball is placed into a bin selected uniformly at random. In a multiplechoice process each ball can be placed into one out of d ≥ 2 randomly selected bins. It is known that in many scenarios having more than one choice for each ball can improve the load balance significantly. Formal analyses of this phenomenon prior to this work considered mostly the lightly loaded case, that is, when m ≈ n. In this paper we present the first tight analysis in the heavily loaded case, that is, when m ≫ n rather than m ≈ n. The best previously known results for the multiplechoice processes in the heavily loaded case were obtained using majorization by the singlechoice process. This yields an upper bound of the maximum load of bins of m/n + O ( √ m ln n/n) with high probability. We show, however, that the multiplechoice processes are fundamentally different from the singlechoice variant in that they have “short memory. ” The great consequence of this property is that the deviation of the multiplechoice processes from the optimal allocation (that is, the allocation in which each bin has either ⌊m/n ⌋ or ⌈m/n ⌉ balls) does not increase with the number of balls as in the case of the singlechoice process. In particular, we investigate the allocation obtained by two different multiplechoice allocation schemes,
Concentration inequalities and martingale inequalities – a survey
 Internet Math
"... Abstract. We examine a number of generalized and extended versions of concentration inequalities and martingale inequalities. These inequalities are effective for analyzing processes with quite general conditions as illustrated in an example for an infinite Polya process and web graphs. 1. ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We examine a number of generalized and extended versions of concentration inequalities and martingale inequalities. These inequalities are effective for analyzing processes with quite general conditions as illustrated in an example for an infinite Polya process and web graphs. 1.
Sharp transition towards shared vocabularies in multiagent systems
 J. Stat. Mech
, 2006
"... What processes can explain how very large populations are able to converge on the use of a particular word or grammatical construction without global coordination? Answering this question helps to understand why new language constructs usually propagate along an Sshaped curve with a rather sudden t ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
(Show Context)
What processes can explain how very large populations are able to converge on the use of a particular word or grammatical construction without global coordination? Answering this question helps to understand why new language constructs usually propagate along an Sshaped curve with a rather sudden transition towards global agreement. It also helps to analyze and design new technologies that support or orchestrate selforganizing communication systems, such as recent social tagging systems for the web. The article introduces and studies a microscopic model of communicating autonomous agents performing language games without any central control. We show that the system undergoes a disorder/order transition, going trough a sharp 1 Preprint Baronchelli et al. symmetry breaking process to reach a shared set of conventions. Before the transition, the system builds up nontrivial scaleinvariant correlations, for instance in the distribution of competing synonyms, which display a Zipflike law. These correlations make the system ready for the transition towards shared conventions, which, observed on the timescale of collective behaviors, becomes sharper and sharper with system size. This surprising result not only explains why human language can scale up to very large populations but also suggests ways to optimize artificial semiotic dynamics. 1
Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws
, 2008
"... ..."
Competitive Parallel Disk Prefetching and Buffer Management
, 1997
"... We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a readonce model of block references. This has widespread applicability to key I/Obound applications such as external merging and concurrent playback of multiple vi ..."
Abstract

Cited by 23 (11 self)
 Add to MetaCart
We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a readonce model of block references. This has widespread applicability to key I/Obound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing.
Marriage, honesty, and stability
 In Proceedings of the Sixteenth Annual ACMSIAM Symposium on Discrete Algorithms (SODA
, 2005
"... Many centralized twosided markets form a matching between participants by running a stable marriage algorithm. It is a wellknown fact that no matching mechanism based on a stable marriage algorithm can guarantee truthfulness as a dominant strategy for participants. However, as we will show in this ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Many centralized twosided markets form a matching between participants by running a stable marriage algorithm. It is a wellknown fact that no matching mechanism based on a stable marriage algorithm can guarantee truthfulness as a dominant strategy for participants. However, as we will show in this paper, in a probabilistic setting where the preference lists of one side of the market are composed of only a constant (independent of the the size of the market) number of entries, each drawn from an arbitrary distribution, the number of participants that have more than one stable partner is vanishingly small. This proves (and generalizes) a conjecture of Roth and Peranson [23]. As a corollary Ó of this result, we show that, with high probability, the truthful strategy is the best response for a given player when the other players are truthful. We also analyze equilibria of the deferred acceptance stable marriage game. We show that the game with complete information has an equilibrium in which a fraction of the strategies are truthful in expectation. In the more realistic setting of a game of incomplete information, we will show that the set of truthful strategies form a Ó
Local limit theorems for finite and infinite urn models
 Ann. Probab
, 2007
"... Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation. 1. Introduction. A classical theorem o ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an optimal improvement over previous ones for normal approximation. 1. Introduction. A classical theorem of Rényi [29] for the number of empty boxes, denoted by μ0(n, M), in a sequence of n random allocations of indistinguishable balls into M boxes with equal probability 1/M, can be stated as follows: If the variance of μ0(n, M) tends to infinity with n, then μ0(n, M) is asymptotically normally distributed. This result, seldom stated in this form in the literature,
Generalizations of Polya’s urn problem
 Annals of Combinatorics
, 2003
"... Abstract. We consider generalizations of the classical Polya urn problem: Given finitely many bins each containing one ball, suppose that additional balls arrive one at a time. For each new ball, with probability p, create a new bin and place the ball in that bin; with probability 1 − p, place the b ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We consider generalizations of the classical Polya urn problem: Given finitely many bins each containing one ball, suppose that additional balls arrive one at a time. For each new ball, with probability p, create a new bin and place the ball in that bin; with probability 1 − p, place the ball in an existing bin, such that the probability the ball is placed in a bin is proportional to m γ,wheremis the number of balls in that bin. For p = 0, the number of bins is fixed and finite, and the behavior of the process depends on whether γ is greater than, equal to, or less than 1. We survey the known results and give new proofs for all three cases. We then consider the case p>0. When γ = 1, this is equivalent to the socalled preferential attachment scheme which leads to power law distribution for bin sizes. When γ>1, we prove that a single bin dominates, i.e., as the number of balls goes to infinity, the probability converges to 1 that any new ball either goes into that bin or creates a new bin. When p>0andγ<1, we show that under the assumption that certain limits exist, the fraction of bins having m balls shrinks exponentially as a function of m. We then discuss further generalizations and pose several open problems.