Results 1  10
of
29
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 224 (19 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
Exact conditional tests for crossclassifications: Approximation of attained significance levels
 Psychometrika
, 1979
"... A procedure is proposed for approximating attained significance levels of exact conditional tests. The procedure utilizes a sampling from the null distribution of tables having the same marginal frequencies as the observed table. Application of the approximation through a computer subroutine yields ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
A procedure is proposed for approximating attained significance levels of exact conditional tests. The procedure utilizes a sampling from the null distribution of tables having the same marginal frequencies as the observed table. Application of the approximation through a computer subroutine yields precise approximations for practically any table dimensions and sample size. Key words: contingency tables, independence, chisquare, KruskalWallis, computer algorithm. 1.
Associative clustering for exploring dependencies between functional genomics data sets
 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
, 2005
"... ..."
(Show Context)
Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes
 Int. Math. Res. Notices
"... Abstract. We prove an asymptotic estimate for the number of m ×n nonnegative integer matrices (contingency tables) with prescribed row and column sums and, more generally, for the number of integer feasible flows in a network. Similarly, we estimate the volume of the polytope of m × n nonnegative ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We prove an asymptotic estimate for the number of m ×n nonnegative integer matrices (contingency tables) with prescribed row and column sums and, more generally, for the number of integer feasible flows in a network. Similarly, we estimate the volume of the polytope of m × n nonnegative real matrices with prescribed row and column sums. Our estimates are solutions of convex optimization problems and hence can be computed efficiently. As a corollary, we show that if row sums R = (r1,..., rm) and column sums C = (c1,..., cn) with r1 +... + rm = c1 +... + cn = N are sufficiently far from constant vectors, then, asymptotically, in the uniform probability space of the m × n nonnegative integer matrices with the total sum N of entries, the event consisting of the matrices with row sums R and the event consisting of the matrices with column sums C are positively correlated. 1. Introduction and
Discriminative Clustering
, 2004
"... A distributional clustering model for continuous data is reviewed and new methods for optimizing and regularizing it are introduced and compared. Based on samples of discretevalued auxiliary data associated to samples of the continuous primary data, the continuous data space is partitioned into Vor ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
A distributional clustering model for continuous data is reviewed and new methods for optimizing and regularizing it are introduced and compared. Based on samples of discretevalued auxiliary data associated to samples of the continuous primary data, the continuous data space is partitioned into Voronoi regions that are maximally homogeneous in terms of the discrete data. Then only variation in the primary data associated to variation in the discrete data a#ects the clustering; the discrete data "supervises" the clustering. Because the whole continuous space is partitioned, new samples can be easily clustered by the continuous part of the data alone. In experiments, the approach is shown to produce more homogeneous clusters than alternative methods. Two regularization methods are demonstrated to further improve the results: an entropytype penalty for unequal cluster sizes, and the inclusion of a model for the marginal density of the primary data. The latter is also interpretable as special kind of joint distribution modeling with tunable emphasis for Preprint submitted to Neurocomputing 23 November 2004 discrimination and the marginal density.
Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts
 Cryptologia
, 1993
"... We explain how to apply statistical techniques to solve several languagerecognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requir ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
We explain how to apply statistical techniques to solve several languagerecognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four welldefined languagerecognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0thorder noise from unknown 1storder language, and 4) detecting nonuniform unknown language. For the second problem we give a most powerful test based on the NeymanPearson Lemma. For the oth...
Maximum entropy Gaussian approximation for the number of integer points and volumes of polytopes
, 2009
"... We describe a maximum entropy approach for computing volumes and counting integer points in polyhedra. To estimate the number of points from a particular set X ⊂ R n in a polyhedron P ⊂ R n, by solving a certain entropy maximization problem, we construct a probability distribution on the set X such ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
We describe a maximum entropy approach for computing volumes and counting integer points in polyhedra. To estimate the number of points from a particular set X ⊂ R n in a polyhedron P ⊂ R n, by solving a certain entropy maximization problem, we construct a probability distribution on the set X such that a) the probability mass function is constant on the set P ∩X and b) the expectation of the distribution lies in P. This allows us to apply Central Limit Theorem type arguments to deduce computationally efficient approximations for the number of integer points, volumes, and the number of 01 vectors in the polytope. As an application, we obtain asymptotic formulas for volumes of multiindex transportation polytopes and for the number of multiway contingency tables.
Bayesian Selection of LogLinear Models
 Canadian Journal of Statistics
, 1995
"... A general methodology is presented for finding suitable Poisson loglinear models with applications to multiway contingency tables. Mixtures of multivariate normal distributions are used to model prior opinion when a subset of the regression vector is believed to be nonzero. This prior distribution ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
A general methodology is presented for finding suitable Poisson loglinear models with applications to multiway contingency tables. Mixtures of multivariate normal distributions are used to model prior opinion when a subset of the regression vector is believed to be nonzero. This prior distribution is studied for two and threeway contingency tables, in which the regression coefficients are interpretable in terms of oddsratios in the table. Efficient and accurate schemes are proposed for calculating the posterior model probabilities. The methods are illustrated for a large number of twoway simulated tables and for two threeway tables. These methods appear to be useful in selecting the best loglinear model and in estimating parameters of interest that reflect uncertainty in the true model. Key words and phrases: Bayes factors, Laplace method, Gibbs sampling, Model selection, Odds ratios. AMS subject classifications: Primary 62H17, 62F15, 62J12. 1 Introduction 1.1 Bayesian testing...
Objective Bayesian analysis of contingency tables
, 2002
"... The statistical analysis of contingency tables is typically carried out with a hypothesis test. In the Bayesian paradigm, default priors for hypothesis tests are typically improper, and cannot be used. Although such priors are available, and proper, for testing contingency tables, we show that for t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
The statistical analysis of contingency tables is typically carried out with a hypothesis test. In the Bayesian paradigm, default priors for hypothesis tests are typically improper, and cannot be used. Although such priors are available, and proper, for testing contingency tables, we show that for testing independence they can be greatly improved on by socalled intrinsic priors. We also argue that because there is no realistic situation that corresponds to the case of conditioning on both margins of a contingency table, the proper analysis of an a × b contingency table should only condition on either the table total or on only one of the margins. The posterior probabilities from the intrinsic priors provide reasonable answers in these cases. Examples using simulated and real data are given.