Results 1  10
of
60
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract

Cited by 548 (13 self)
 Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 264 (20 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
An Interruptible Algorithm for Perfect Sampling via Markov Chains
 Annals of Applied Probability
, 1998
"... For a large class of examples arising in statistical physics known as attractive spin systems (e.g., the Ising model), one seeks to sample from a probability distribution # on an enormously large state space, but elementary sampling is ruled out by the infeasibility of calculating an appropriate nor ..."
Abstract

Cited by 92 (7 self)
 Add to MetaCart
(Show Context)
For a large class of examples arising in statistical physics known as attractive spin systems (e.g., the Ising model), one seeks to sample from a probability distribution # on an enormously large state space, but elementary sampling is ruled out by the infeasibility of calculating an appropriate normalizing constant. The same difficulty arises in computer science problems where one seeks to sample randomly from a large finite distributive lattice whose precise size cannot be ascertained in any reasonable amount of time. The Markov chain Monte Carlo (MCMC) approximate sampling approach to such a problem is to construct and run "for a long time" a Markov chain with longrun distribution #. But determining how long is long enough to get a good approximation can be both analytically and empirically difficult. Recently, Jim Propp and David Wilson have devised an ingenious and efficient algorithm to use the same Markov chains to produce perfect (i.e., exact) samples from #. However, the running t...
Inference in Curved Exponential Family Models for Networks
 Journal of Computational and Graphical Statistics
, 2006
"... Network data arise in a wide variety of applications. Although descriptive statistics for networks abound in the literature, the science of fitting statistical models to complex network data is still in its infancy. The models considered in this article are based on exponential families; therefore, ..."
Abstract

Cited by 78 (10 self)
 Add to MetaCart
Network data arise in a wide variety of applications. Although descriptive statistics for networks abound in the literature, the science of fitting statistical models to complex network data is still in its infancy. The models considered in this article are based on exponential families; therefore, we refer to them as exponential random graph models (ERGMs). Although ERGMs are easy to postulate, maximum likelihood estimation of parameters in these models is very difficult. In this article, we first review the method of maximum likelihood estimation using Markov chain Monte Carlo in the context of fitting linear ERGMs. We then extend this methodology to the situation where the model comes from a curved exponential family. The curved exponential family methodology is applied to new specifications of ERGMs, proposed by Snijders et al. (2004), having nonlinear parameters to represent structural properties of networks such as transitivity and heterogeneity of degrees. We review the difficult topic of implementing likelihood ratio tests for these models, then apply all these modelfitting and testing techniques to the estimation of linear and nonlinear parameters for a collaboration network between partners in a New England law firm.
Sequential Monte Carlo methods for statistical analysis of tables
 J. Amer. Statist. Assoc
"... We describe a sequential importance sampling (SIS) procedure for analyzing twoway zero–one or contingency tables with fixed marginal sums. An essential feature of the new method is that it samples the columns of the table progressively according to certain special distributions. Our method produces ..."
Abstract

Cited by 74 (10 self)
 Add to MetaCart
(Show Context)
We describe a sequential importance sampling (SIS) procedure for analyzing twoway zero–one or contingency tables with fixed marginal sums. An essential feature of the new method is that it samples the columns of the table progressively according to certain special distributions. Our method produces Monte Carlo samples that are remarkably close to the uniform distribution, enabling one to approximate closely the null distributions of various test statistics about these tables. Our method compares favorably with other existing Monte Carlobased algorithms, and sometimes is a few orders of magnitude more efficient. In particular, compared with Markov chain Monte Carlo (MCMC)based approaches, our importance sampling method not only is more efficient in terms of absolute running time and frees one from pondering over the mixing issue, but also provides an easy and accurate estimate of the total number of tables with fixed marginal sums, which is far more difficult for an MCMC method to achieve.
Assessing data mining results via swap randomization
 ACM Transactions on Knowledge Discovery from Data
"... The problem of assessing the significance of data mining results on highdimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chisquare tests, or many other methods. H ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
The problem of assessing the significance of data mining results on highdimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chisquare tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and rankings. To generate random datasets with given margins, we use variations of a Markov chain approach, which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several wellknown datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is a random artifact, while for other datasets the discovered structure conveys meaningful information.
Minimal basis for connected Markov chain over 3 × 3 × K contingency tables with fixed twodimensional marginals
, 2002
"... We consider connected Markov chain for sampling 3 × 3 × K contingency tables having fixed twodimensional marginal totals. Such sampling arises inperforming various tests of... ..."
Abstract

Cited by 50 (14 self)
 Add to MetaCart
We consider connected Markov chain for sampling 3 &times; 3 &times; K contingency tables having fixed twodimensional marginal totals. Such sampling arises inperforming various tests of...
Sequential importance sampling for multiway tables
 Annals of Statistics
, 2005
"... We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated to ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated toric ideal using computational commutative algebra. In particular, the property of interval cell counts at each step is related to exponents on lead indeterminates of a lexicographic Gröbner basis. Also, the approximation of integer programming by linear programming for sampling is related to initial terms of a toric ideal. We apply the algorithm to examples of contingency tables which appear in the social and medical sciences. The numerical results demonstrate that the theory is applicable and that the algorithm performs well. 1. Introduction. Sampling
Markov Chain Monte Carlo for Statistical Inference
 University of Washington, Center for
, 2000
"... These notes provide an introduction to Markov chain Monte Carlo methods that are useful in both Bayesian and frequent... ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
(Show Context)
These notes provide an introduction to Markov chain Monte Carlo methods that are useful in both Bayesian and frequent...
Markov chain Monte Carlo exact tests for incomplete twoway contingency tables
, 2002
"... We consider testing the quasiindependence hypothesis for twoway contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample... ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
We consider testing the quasiindependence hypothesis for twoway contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample...