Results 1  10
of
14
Sampling regular graphs and a peertopeer network
 Proceedings of the sixteenth annual ACMSIAM Symposium on Discrete Algorithms
, 2005
"... This paper has two parts. In the first part we consider a simple Markov chain for dregular graphs on n vertices, where d = d(n) may grow with n. We show that the mixing time of this Markov chain is bounded above by a polynomial in n and d. In the second part of the paper, a related Markov chain for ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
This paper has two parts. In the first part we consider a simple Markov chain for dregular graphs on n vertices, where d = d(n) may grow with n. We show that the mixing time of this Markov chain is bounded above by a polynomial in n and d. In the second part of the paper, a related Markov chain for dregular graphs on a varying number of vertices is introduced, for even constant d. This is a model for a certain peertopeer network. We prove that the related chain has mixing time which is bounded above by a polynomial in N, the expected number of vertices, provided certain assumptions are met about the rate of arrival and departure of vertices. 1
Assessing data mining results via swap randomization
 ACM Transactions on Knowledge Discovery from Data
"... The problem of assessing the significance of data mining results on highdimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chisquare tests, or many other methods. H ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
The problem of assessing the significance of data mining results on highdimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chisquare tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and rankings. To generate random datasets with given margins, we use variations of a Markov chain approach, which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several wellknown datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is a random artifact, while for other datasets the discovered structure conveys meaningful information.
Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems
 In Proceedings of the 17th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
, 2006
"... We present an improved “cooling schedule ” for simulated annealing algorithms for combinatorial counting problems. Under our new schedule the rate of cooling accelerates as the temperature decreases. Thus, fewer intermediate temperatures are needed as the simulated annealing algorithm moves from the ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
We present an improved “cooling schedule ” for simulated annealing algorithms for combinatorial counting problems. Under our new schedule the rate of cooling accelerates as the temperature decreases. Thus, fewer intermediate temperatures are needed as the simulated annealing algorithm moves from the high temperature (easy region) to the low temperature (difficult region). We present applications of our technique to colorings and the permanent (perfect matchings of bipartite graphs). Moreover, for the permanent, we improve the analysis of the Markov chain underlying the simulated annealing algorithm. This improved analysis, combined with the faster cooling schedule, results in an O(n 7 log 4 n) time algorithm for approximating the permanent of a 0/1 matrix. 1
A Sequential Algorithm for Generating Random Graphs
, 2006
"... Abstract. We present a nearlylinear time algorithm for counting and randomly generating simple graphs with a given degree sequence in a certain range. For degree sequence (di) n i=1 with maximum degree dmax = O(m 1/4−τ), our algorithm generates almost uniform random graphs with that degree sequence ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. We present a nearlylinear time algorithm for counting and randomly generating simple graphs with a given degree sequence in a certain range. For degree sequence (di) n i=1 with maximum degree dmax = O(m 1/4−τ), our algorithm generates almost uniform random graphs with that degree sequence in time O(m dmax) where m = 1 2 i di is the number of edges in the graph and τ is any positive constant. The fastest known algorithm for uniform generation of these graphs [35] has a running time of O(m 2 d 2 max). Our method also gives an independent proof of McKay’s estimate [34] for the number of such graphs. We also use sequential importance sampling to derive fully Polynomialtime Randomized Approximation Schemes (FPRAS) for counting and uniformly generating random graphs for the same range of dmax = O(m 1/4−τ). Moreover, we show that for d = O(n 1/2−τ), our algorithm can generate an asymptotically uniform dregular graph. Our results improve the previous bound of d = O(n 1/3−τ) due to Kim and Vu [31] for regular graphs. 1
A local switch Markov chain on given degree graphs with application in connectivity of peertopeer networks
 In Proc. of FOCS’06
, 2006
"... We study a switch Markov chain on regular graphs, where switches are allowed only between links that are at distance 2; we call this the Flip. The motivation for studying the Flip Markov chain arises in the context of unstructured peertopeer networks, which constantly perform such flips in an effo ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We study a switch Markov chain on regular graphs, where switches are allowed only between links that are at distance 2; we call this the Flip. The motivation for studying the Flip Markov chain arises in the context of unstructured peertopeer networks, which constantly perform such flips in an effort to randomize. We show that the Flip Markov chain on regular graphs is rapidly mixing, thus justifying this widely used peertopeer networking practice. Our mixing argument uses the Markov chain comparison technique. In particular, we extend this technique to embedding arguments where the compared Markov chains are defined on different state spaces. We give several conditions which generalize our results beyond regular graphs. 1
On the number of matrices and a random matrix with prescribed row and column sums and 01 entries, preprint arXiv:0806.1480
, 2008
"... Abstract. We consider the set Σ(R, C) of all m ×n matrices having 01 entries and prescribed row sums R = (r1,..., rm) and column sums C = (c1,..., cn). We prove an asymptotic estimate for the cardinality Σ(R, C)  via the solution to a convex optimization problem. We show that if Σ(R, C) is suffic ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. We consider the set Σ(R, C) of all m ×n matrices having 01 entries and prescribed row sums R = (r1,..., rm) and column sums C = (c1,..., cn). We prove an asymptotic estimate for the cardinality Σ(R, C)  via the solution to a convex optimization problem. We show that if Σ(R, C) is sufficiently large, then a random matrix D ∈ Σ(R, C) sampled from the uniform probability measure in Σ(R, C) with high probability is close to a particular matrix Z = Z(R, C) that maximizes the sum of entropies of entries among all matrices with row sums R, column sums C and entries between 0 and 1. Similar results are obtained for 01 matrices with prescribed row and column sums and assigned zeros in some positions. 1. Introduction and
Approximately counting integral flows and cellbounded contingency tables
 STOC’05: PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, ACM
, 2005
"... We consider the problem of approximately counting integral flows in a network. We show that there is an fpras based on volume estimation if all capacities are sufficiently large, generalising a result of Dyer, Kannan and Mount (1997). We apply this to approximating the number of contingency tables w ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We consider the problem of approximately counting integral flows in a network. We show that there is an fpras based on volume estimation if all capacities are sufficiently large, generalising a result of Dyer, Kannan and Mount (1997). We apply this to approximating the number of contingency tables with prescribed cell bounds when the number of rows is constant, but the row sums, column sums and cell bounds may be arbitrary. We provide an fpras for this problem via a combination of dynamic programming and volume estimation. This generalises an algorithm of Cryan and Dyer (2002) for standard contingency tables, but the analysis here is considerably more intricate.
Form and Content
, 1973
"... The HOL Light theorem prover can be difficult to get started with. While the manual is fairly detailed and comprehensive, the large amount of background information that has to be absorbed before the user can do anything interesting is intimidating. Here we give an alternative ‘quick start ’ guide, ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The HOL Light theorem prover can be difficult to get started with. While the manual is fairly detailed and comprehensive, the large amount of background information that has to be absorbed before the user can do anything interesting is intimidating. Here we give an alternative ‘quick start ’ guide, aimed at teaching basic use of the system quickly by means of a graded set of examples. Some readers may find it easier to absorb; those who do not are referred after all to the
Negative Examples for Sequential Importance Sampling of Binary Contingency Tables. Submitted. Available from Mathematics arXiv math.ST/0606650
"... Abstract. The sequential importance sampling (SIS) algorithm has gained considerable popularity for its empirical success. One of its noted applications is to the binary contingency tables problem, an important problem in statistics, where the goal is to estimate the number of 0/1 matrices with pres ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. The sequential importance sampling (SIS) algorithm has gained considerable popularity for its empirical success. One of its noted applications is to the binary contingency tables problem, an important problem in statistics, where the goal is to estimate the number of 0/1 matrices with prescribed row and column sums. We give a family of examples in which the SIS procedure, if run for any subexponential number of trials, will underestimate the number of tables by an exponential factor. This result holds for any of the usual design choices in the SIS algorithm, namely the ordering of the columns and rows. These are apparently the first theoretical results on the efficiency of the SIS algorithm for binary contingency tables. Finally, we present experimental evidence that the SIS algorithm is efficient for row and column sums that are regular. Our work is a first step in determining rigorously the class of inputs for which SIS is effective. 1
Approximately Counting Embeddings into Random Graphs
, 806
"... Let H be a graph, and let CH(G) be the number of (subgraph isomorphic) copies of H contained in a graph G. We investigate the fundamental problem of estimating CH(G). Previous results cover only a few specific instances of this general problem, for example, the case when H has degree at most one (mo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Let H be a graph, and let CH(G) be the number of (subgraph isomorphic) copies of H contained in a graph G. We investigate the fundamental problem of estimating CH(G). Previous results cover only a few specific instances of this general problem, for example, the case when H has degree at most one (monomerdimer problem). In this paper, we present the first general subcase of the subgraph isomorphism counting problem which is almost always efficiently approximable. The results rely on a new graph decomposition technique. Informally, the decomposition is a labeling of the vertices such that every edge is between vertices with different labels and for every vertex all neighbors with a higher label have identical labels. The labeling implicitly generates a sequence of bipartite graphs which permits us to break the problem of counting embeddings of large subgraphs into that of counting embeddings of small subgraphs. Using this method, we present a simple randomized algorithm for the counting problem. For all decomposable graphs H and all graphs G, the algorithm is an unbiased estimator. Furthermore, for all graphs H having a decomposition where each of the bipartite graphs generated is small and almost all graphs G, the algorithm is a fully polynomial randomized approximation scheme. We show that the graph classes of H for which we obtain a fully polynomial randomized approximation scheme for almost all G includes graphs of degree at most two, boundeddegree forests, boundedwidth grid graphs, subdivision of boundeddegree graphs, and major subclasses of outerplanar graphs, seriesparallel graphs and planar graphs, whereas unboundedwidth grid graphs are excluded. Additionally, our general technique can easily be applied to proving many more similar results. 1 1