## Sequential importance sampling for multiway tables (2005)

### Cached

### Download Links

Venue: | Annals of Statistics |

Citations: | 23 - 3 self |

### BibTeX

@ARTICLE{Chen05sequentialimportance,

author = {Yuguo Chen and Ian H. Dinwoodie and Seth Sullivant},

title = {Sequential importance sampling for multiway tables},

journal = {Annals of Statistics},

year = {2005},

volume = {34},

pages = {523--545}

}

### OpenURL

### Abstract

We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated toric ideal using computational commutative algebra. In particular, the property of interval cell counts at each step is related to exponents on lead indeterminates of a lexicographic Gröbner basis. Also, the approximation of integer programming by linear programming for sampling is related to initial terms of a toric ideal. We apply the algorithm to examples of contingency tables which appear in the social and medical sciences. The numerical results demonstrate that the theory is applicable and that the algorithm performs well. 1. Introduction. Sampling

### Citations

331 | Core Team 2004: R: A language and environment for statistical computing - Development |

231 |
Network tomography: Estimating source-destination traffic intensities from link data
- Vardi
- 1996
(Show Context)
Citation Context ..., 6, 13]. A more general problem is sampling from nonnegative integer lattice points. This includes contingency tables, and further applications such as Monte Carlo EM algorithms with incomplete data =-=[31]-=- and Bayesian computation of posterior distributions [30]. Markov chain Monte Carlo (MCMC) has been a popular technique for generating random samples from tables with given constraints. It is usually ... |

201 |
Sequential imputations and Bayesian missing data problems
- Kong, Liu, et al.
- 1994
(Show Context)
Citation Context ...umber of i.i.d. samples from the target distribution that are needed to give the same standard error for ˆµ as N importance samples. A rough approximation for this number is the effective sample size =-=[24]-=- ESS = N , 1 + cv2 (5) where the coefficient of variation (cv) isdefinedas cv 2 = varq{p(n)/q(n)} E2 . q {p(n)/q(n)} (6) Accurate estimation generally requires a low cv 2 ,thatis,q(n) must be sufficie... |

161 | Solving systems of polynomial equations
- Sturmfels
- 2002
(Show Context)
Citation Context ...ly as vector increments, regardless of the actual value of t. In other words, a Markov basis always exists independently of the actual values of the linear constraints. The second fundamental result (=-=[29]-=-, Theorem 8.14) is that a collection of moves will connect two tables n and m if xn − xm ∈ I ,whereIis the ideal generated by the collection of moves. This is used to show connectivity for subcollecti... |

139 |
Performing the exact test of Hardy-Weinberg proportions for multiple alleles
- Guo, Thompson
- 1992
(Show Context)
Citation Context ...bts about the validity of asymptotic methods. A classical application is testing for Hardy–Weinberg equilibrium with multiple alleles, where some alleles may be quite rare and result in sparse tables =-=[20]-=-. Other applications are described in [2, 6, 13]. A more general problem is sampling from nonnegative integer lattice points. This includes contingency tables, and further applications such as Monte C... |

110 | A survey of exact inference for contingency tables - Agresti - 1992 |

91 | Bayesian Inference on Network Traffic Using Link Count Data
- Tebaldi, West
- 1996
(Show Context)
Citation Context ...tive integer lattice points. This includes contingency tables, and further applications such as Monte Carlo EM algorithms with incomplete data [31] and Bayesian computation of posterior distributions =-=[30]-=-. Markov chain Monte Carlo (MCMC) has been a popular technique for generating random samples from tables with given constraints. It is usually easy to program, does not require a lot of memory, and ha... |

66 |
Gröbner Bases and Convex
- Sturmfels
- 1996
(Show Context)
Citation Context ...deal, so applying Lemma 4.1 in sequence is immediate. With a subbasis, however, one must add a technical condition involving saturation to get the sequential application of Lemma 4.1. Saturation (see =-=[28]-=-, page 113 or [25], page 215) is an algebraic procedure that enlarges an ideal. In our case the ideal will correspond to a collection of Markov moves possibly less than a full Markov basis. If I is an... |

55 |
Testing for independence in a two-way table: new interpretations of thechi-square statistic (with discussion
- Diaconis, Efron
- 1995
(Show Context)
Citation Context ...ds. A classical application is testing for Hardy–Weinberg equilibrium with multiple alleles, where some alleles may be quite rare and result in sparse tables [20]. Other applications are described in =-=[2, 6, 13]-=-. A more general problem is sampling from nonnegative integer lattice points. This includes contingency tables, and further applications such as Monte Carlo EM algorithms with incomplete data [31] and... |

51 | Sequential monte carlo methods for statistical analysis of tables
- Chen, Diaconis, et al.
(Show Context)
Citation Context ...ds. A classical application is testing for Hardy–Weinberg equilibrium with multiple alleles, where some alleles may be quite rare and result in sparse tables [20]. Other applications are described in =-=[2, 6, 13]-=-. A more general problem is sampling from nonnegative integer lattice points. This includes contingency tables, and further applications such as Monte Carlo EM algorithms with incomplete data [31] and... |

50 | Traverso: Buchberger algorithm and integer programming - Conti, C - 1991 |

49 |
A fast procedure for model search in multidimensional contingency tables
- Edwards, Havraneek
- 1985
(Show Context)
Citation Context ...tion on the model [RS], [RA], [RO], [SO], [SA], [OA]. EXAMPLE 7.3. Consider the 6-way binary Czech autoworker data in Table 3 from a prospective study of probable risk factors for coronary thrombosis =-=[18]-=-. There are 1,841 men in a car factory involved in the study. Here A, B, C, D, E and F indicate different risk factors. One reasonable model is given by [ACDEF], [ABDEF], [ABCDE], [BCDF], [ABCF], [BCE... |

45 | Log-Linear Models
- Christensen
- 1990
(Show Context)
Citation Context ... method gave cv 2 of 0.5, and the estimated p-value for the exact goodness-of-fit test [defined by equations (1) and (2)] is 0.04. Example 7.2. Consider the 4-way abortion opinion data (Table 2) from =-=[8]-=-, page 129. The observations are classified according to race, sex, age and opinion. There are three different opinions: yes means supporting legalized abortion, no means opposing legalized abortion, ... |

42 |
Generalized Monte Carlo significance tests
- Besag, Clifford
- 1989
(Show Context)
Citation Context ...ds. A classical application is testing for Hardy–Weinberg equilibrium with multiple alleles, where some alleles may be quite rare and result in sparse tables [20]. Other applications are described in =-=[2, 6, 13]-=-. A more general problem is sampling from nonnegative integer lattice points. This includes contingency tables, and further applications such as Monte Carlo EM algorithms with incomplete data [31] and... |

34 |
An algorithm to calculate the lower and upper bounds of the elements of an array given its marginals
- Buzzigoli, Giusti
- 1999
(Show Context)
Citation Context ...ange which can lead to errors. The program that we embedded into the sampling code and that worked well is lpSolve [1]. A third way to approximate the intervals is the shuttle algorithm, described in =-=[5]-=- and [16]. This is an iterative method that usually does not give exact IP results, but it has two advantages in special cases: it is fast and easy to program, and it can be implemented without explic... |

22 | Bounds for cell entries in contingency tables induced by fixed marginal totals
- Dobra, Fienberg
(Show Context)
Citation Context ...ch can lead to errors. The program that we embedded into the sampling code and that worked well is lpSolve [1]. A third way to approximate the intervals is the shuttle algorithm, described in [5] and =-=[16]-=-. This is an iterative method that usually does not give exact IP results, but it has two advantages in special cases: it is fast and easy to program, and it can be implemented without explicitly cons... |

22 | Note on an exact treatment of contingency, goodness of fit and other problems of significance - Freeman, Halton - 1951 |

21 |
a computer algebra system for polynomial computations, URL http://www. singular.uni-kl.de
- Greuel, Pfister, et al.
(Show Context)
Citation Context ...t to verify the conditions of Sections 3, 4 and 5 for a particular example is to attempt to compute the toric ideal IA. For this we have used the toric library toric.lib in the free software Singular =-=[19]-=- andthegroebner command in 4ti2[21]. The software 4ti2 was used to construct constraint matrices for several examples. The operations of saturation and quotient (“:”) that figure in the results of Sec... |

18 | Computing the integer programming gap - Hoşten, Sturmfels - 2003 |

16 | Computational commutative algebra - Kreuzer, Robbiano - 2000 |

14 |
Buchberger algorithm and integer programming, Applied algebra, algebraic algorithms and error-correcting codes (New
- Conti, Traverso
- 1991
(Show Context)
Citation Context ...ities Uj − uj and lj − Lj. In Propositions 5.1 and 5.2 that follow, we use the relationship between lower and upper IP bounds and normal forms with respect to lex and grevlex term orders explained in =-=[9]-=- and stated in Algorithm 5.6 of [28], page 43. For the following proposition, let A −1 Q [t] := {q ∈ Qd + :Aq = t}, the set of nonnegative rational vectors with constraints t. Proposition 5.1. Suppose... |

13 |
4ti2 version 1.1—computation of Hilbert bases, Graver bases, toric Gröbner bases, and more
- Hemmecke, Hemmecke
- 2003
(Show Context)
Citation Context ...ons 3, 4 and 5 for a particular example is to attempt to compute the toric ideal IA. For this we have used the toric library toric.lib in the free software Singular [19] andthegroebner command in 4ti2=-=[21]-=-. The software 4ti2 was used to construct constraint matrices for several examples. The operations of saturation and quotient (“:”) that figure in the results of Sections 4 and 5 were done quickly in ... |

9 |
Statistical Methods in Cancer Research, 1. The Analysis of Case-Control Studies
- Breslow, Day
- 1980
(Show Context)
Citation Context ...the 3-way case/control data (Table 1) in the 4 × 4 × 2 table from the Ille-et-Verlaine cancer study of the age 35–44 groupsSIS FOR MULTIWAY TABLES 17 Table 1 Age 35–44 data on oesophageal cancer from =-=[3]-=- A 1 2 3 4 R = 0 T 1 60 35 11 1 2 13 20 6 3 3 7 13 2 2 4 8 8 1 0 R = 1 T 1 0 0 0 2 2 1 3 0 0 3 0 1 0 2 4 0 0 0 0 ([3], Appendix I). The factors are Alcohol level (A), Tobacco level (T) and Response R,... |

8 |
Markov bases and structural zeros
- Rapallo
(Show Context)
Citation Context ... does not, and this can be seen in the lex basis. Finally, Example 7.6 is an important application of sampling on lattice points that are not strictly speaking contingency tables. The work of Rapallo =-=[27]-=- on Markov bases and structural zeros may be useful for other examples. The starting point to verify the conditions of Sections 3, 4 and 5 for a particular example is to attempt to compute the toric i... |

7 | Data augmentation in multiway contingency tables with fixed marginal totals
- Dobra, Tebaldi, et al.
- 2006
(Show Context)
Citation Context ...here are 1,841 men in a car factory involved in the study. Here A, B, C, D, E and F indicate different risk factors. One reasonable model is given by [ACDEF], [ABDEF], [ABCDE], [BCDF], [ABCF], [BCEF] =-=[17]-=-. The conditional goodnessof-fit test for this model requires fixing the three 5-way and the three 4-ways540 Y. CHEN, I. H. DINWOODIE AND S. SULLIVANT TABLE 3 6-way Czech autoworker data from [18] B n... |

6 | Lattice points, contingency tables, and sampling
- Chen, Dinwoodie, et al.
- 2005
(Show Context)
Citation Context ...an fill in entries in sequence and expect the range of feasible values to be an interval of integers. Examples where the sequential interval property does not hold are very sparse logistic regression =-=[7]-=-, many 3-way tables with certain margin constraints (see [12] for the full range of difficulties with 3-way tables) and some triangular tables of genotype data when cells are sampled in certain orders... |

6 | Monte Carlo algorithms for Hardy-Weinberg proportions
- Huber, Chen, et al.
- 2006
(Show Context)
Citation Context ...roposition 5.2 does not hold from the first cell, but it does hold after a few cells, so IP and LP give the same bounds after some initial cells. The simulation with LP produced 100% good tables. See =-=[23]-=- for a direct sampling strategy and some further discussion of this example. EXAMPLE 7.6. Consider a constraint matrix A of the form A = (A0|I) with 0 or 1 entries. Here I is the e × e identity matrix... |

4 | A user’s guide for latte v1.1. Available at http://www.math. ucdavis.edu/~latte - Loera, Haws, et al. - 2003 |

2 |
lpsolve: Open Source (Mixed-Integer
- Berkelaar, Eikland, et al.
- 2004
(Show Context)
Citation Context ...ng a number out of the feasible range [l,u] or into a strict subset of the feasible range which can lead to errors. The program that we embedded into the sampling code and that worked well is lpSolve =-=[1]-=-. A third way to approximate the intervals is the shuttle algorithm, described in [5] and [16]. This is an iterative method that usually does not give exact IP results, but it has two advantages in sp... |

2 | Universality of Markov bases of slim three-way tables - DeLoera, Onn - 2004 |

2 |
Algebraic methods for sampling from conditional distributions
- Diaconis, Sturmfels
- 1998
(Show Context)
Citation Context ...popular technique for generating random samples from tables with given constraints. It is usually easy to program, does not require a lot of memory, and has wide applicability. Diaconis and Sturmfels =-=[14]-=- gave algebraic characterizations of the moves necessary to run such a Markov chain. However, for some loglinear models the constraints from sufficient statistics on multiway tables make it difficult ... |

2 | Bayesian inference in incomplete multi-way tables - Dobra, West - 2003 |

2 | An extension of the Mantel-Haenszel procedure to K 2 × c contingency tables and the relation to the logit model - Sugiura, Otake - 1974 |

2 |
Computing the integer programming gap. Combinatorica
- HOSTEN, B
- 2006
(Show Context)
Citation Context ...nearly the same answer is important, because using an IP algorithm at each step in the procedure would be much slower than using LP. A precise algebraic relationship between LP and IP is developed in =-=[22]-=-, which gives an algorithm for finding the maximum difference between the two over all conceivable data sets. The results here may be easier to apply in some examples. In practice it is not essential ... |

1 |
Conditional expectations in network traffic estimation
- Dinwoodie
- 2000
(Show Context)
Citation Context ...h and y is the aggregate traffic across links. The sampling method of Tebaldi and West [30] for Bayesian computation of the posterior distribution is closely related to sequential sampling. Dinwoodie =-=[15]-=- shows how fast sampling can be used in a Monte Carlo EM algorithm for estimating traffic rates. TABLE 6 Order of cells 1 10 2 11 18 3 12 19 25 4 13 20 26 31 5 14 21 27 32 36 6 15 22 28 33 37 40 7 16 ... |

1 | Sampling for conditional inference on case/control data. Discussion Paper 04-29, Institute of Statistics and Decision Sciences, Duke University. Available at http://www.stat.duke.edu/papers - Chen, Dinwoodie, et al. - 2004 |

1 | Markov bases and structural zeros. Manuscript - Rapallo - 2004 |

1 |
A User’s Guide for LattE v1.1. Available at www.math.ucdavis.edu/~latte
- LOERA, HAWS, et al.
- 2003
(Show Context)
Citation Context ...s been carried out in [6], where SIS was shown to be more efficient than Markov chains for counting and testing two-way tables. Combinatorists are interested in counting tables with given constraints =-=[11]-=-. Counting tables is also related to conditional volume tests [13]. In our multiway examples, we found approximate counts of tables without difficulty. The exact counting software LattE [11] confirmed... |

1 |
Markov basis of three-way tables are arbitrarily complicated
- LOERA, S
- 2006
(Show Context)
Citation Context ...ble values to be an interval of integers. Examples where the sequential interval property does not hold are very sparse logistic regression [7], many 3-way tables with certain margin constraints (see =-=[12]-=- for the full range of difficulties with 3-way tables) and some triangular tables of genotype data when cells are sampled in certain orders. Typically, there may be a problem if the moves of a Markov ... |