Results 1  10
of
12
Sequential Monte Carlo methods for statistical analysis of tables
 J. Amer. Statist. Assoc
"... We describe a sequential importance sampling (SIS) procedure for analyzing twoway zero–one or contingency tables with fixed marginal sums. An essential feature of the new method is that it samples the columns of the table progressively according to certain special distributions. Our method produces ..."
Abstract

Cited by 51 (10 self)
 Add to MetaCart
We describe a sequential importance sampling (SIS) procedure for analyzing twoway zero–one or contingency tables with fixed marginal sums. An essential feature of the new method is that it samples the columns of the table progressively according to certain special distributions. Our method produces Monte Carlo samples that are remarkably close to the uniform distribution, enabling one to approximate closely the null distributions of various test statistics about these tables. Our method compares favorably with other existing Monte Carlobased algorithms, and sometimes is a few orders of magnitude more efficient. In particular, compared with Markov chain Monte Carlo (MCMC)based approaches, our importance sampling method not only is more efficient in terms of absolute running time and frees one from pondering over the mixing issue, but also provides an easy and accurate estimate of the total number of tables with fixed marginal sums, which is far more difficult for an MCMC method to achieve.
On Computing the Distribution Function for the Sum of Independent and Nonidentical Random Indicators
, 2011
"... The Poisson binomial distribution is the distribution of the sum of independent and nonidentical random indicators. Each indicator follows a Bernoulli distribution with individual success probability. When all success probabilities are equal, the Poisson binomial distribution is a binomial distribu ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The Poisson binomial distribution is the distribution of the sum of independent and nonidentical random indicators. Each indicator follows a Bernoulli distribution with individual success probability. When all success probabilities are equal, the Poisson binomial distribution is a binomial distribution. The Poisson binomial distribution has many applications in different areas such as reliability, survival analysis, survey sampling, econometrics, etc. The computing of the cumulative distribution function (cdf) of the Poisson binomial distribution, however, is not straightforward. Approximation methods such as the Poisson approximation and normal approximations have been used in literature. Recursive formulae also have been used to compute the cdf in some areas. In this paper, we present a simple derivation for an exact formula with a closedform expression for the cdf of the Poisson binomial distribution. The derivation uses the discrete Fourier transform of the characteristic function of the distribution. We develop an algorithm for efficient implementation of the exact formula. Numerical studies were conducted to study the accuracy of the developed algorithm and the accuracy of approximation methods. We also studied the computational efficiency of different methods. The paper is concluded with a discussion on the use of different methods in practice and some suggestions for practitioners.
Conditional inference on tables with structural zeros, Discussion Paper 0426
, 2004
"... We develop a set of sequential importance sampling (SIS) strategies for sampling nearly uniformly from twoway zeroone or contingency tables with fixed marginal sums and a given set of structural zeros. The SIS procedure samples tables column by column or cell by cell by using appropriate proposal ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We develop a set of sequential importance sampling (SIS) strategies for sampling nearly uniformly from twoway zeroone or contingency tables with fixed marginal sums and a given set of structural zeros. The SIS procedure samples tables column by column or cell by cell by using appropriate proposal distributions, and enables us to approximate closely the null distributions of a number of test statistics involved in such tables. When structural zeros are on the diagonal or follow certain patterns, more efficient SIS algorithms are developed which guarantee that every generated table is valid. Examples show that our methods can be applied to make conditional inference on zeroone and contingency tables, and are more efficient than other existing Monte Carlo algorithms.
CrowdBlending Privacy
"... Abstract. We introduce a new definition of privacy called crowdblending privacy that strictly relaxes the notion of differential privacy. Roughly speaking, kcrowd blending private sanitization of a database requires that each individual i in the database “blends ” with k other individuals j in the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. We introduce a new definition of privacy called crowdblending privacy that strictly relaxes the notion of differential privacy. Roughly speaking, kcrowd blending private sanitization of a database requires that each individual i in the database “blends ” with k other individuals j in the database, in the sense that the output of the sanitizer is “indistinguishable ” if i’s data is replaced by j’s. We demonstrate crowdblending private mechanisms for histograms and for releasing synthetic data points, achieving strictly better utility than what is possible using differentially private mechanisms. Additionally, we demonstrate that if a crowdblending private mechanism is combined with a “presampling ” step, where the individuals in the database are randomly drawn from some underlying population (as is often the case during data collection), then the combined mechanism satisfies not only differential privacy, but also the stronger notion of zeroknowledge privacy. This holds even if the presampling is slightly biased and an adversary knows whether certain individuals were sampled or not. Taken together, our results yield a practical approach for collecting and privately releasing data while ensuring higher utility than previous approaches. 1
Noisy threshold functions for modelling causal independence in Bayesian networks
, 2006
"... Causal independence modelling is a wellknown method both for reducing the size of probability tables and for explaining the underlying mechanisms in Bayesian networks. Many Bayesian network models incorporate causal independence assumptions; however, only the noisy OR and noisy AND, two examples of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Causal independence modelling is a wellknown method both for reducing the size of probability tables and for explaining the underlying mechanisms in Bayesian networks. Many Bayesian network models incorporate causal independence assumptions; however, only the noisy OR and noisy AND, two examples of causal independence models, are used in practice. Their underlying assumption that either at least one cause, or all causes together, give rise to an effect, however, seems unnecessarily restrictive. In the present paper a new, more flexible, causal independence model is proposed, based on the Boolean threshold function. A connection is established between conditional probability distributions based on the noisy threshold model and Poisson binomial distributions, and the basic properties of this probability distribution are studied in some depth. We present and analyse recursive methods as well as approximation and bounding techniques to assess the conditional probabilities in the noisy threshold models.
Article
, 2008
"... Simulationbased randomized systematic PPS sampling under substitution of units ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Simulationbased randomized systematic PPS sampling under substitution of units
Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets
"... We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel nonparametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Als ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel nonparametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons. 1
RareEvent Simulation and Counting Problems
, 2009
"... Randomized approximation algorithms for counting problems have been the subject of many papers and monographs in Theoretical Computer Science (e.g.,[13, 15, 25, 26]). At the same time, rareevent simulation methodology has a long history of development within the Applied Probability and Operations R ..."
Abstract
 Add to MetaCart
Randomized approximation algorithms for counting problems have been the subject of many papers and monographs in Theoretical Computer Science (e.g.,[13, 15, 25, 26]). At the same time, rareevent simulation methodology has a long history of development within the Applied Probability and Operations Research communities.
Probabilistic nChoosek Models for Classification and Ranking
"... In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabili ..."
Abstract
 Add to MetaCart
In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a countconditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a PoissonBinomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multiobject classification, learning to rank, and topK classification. 1
Dynamic Scaled Sampling for Deterministic Constraints
"... Deterministic and neardeterministic relationships among subsets of random variables in multivariate systems are known to cause serious problems for Monte Carlo algorithms. We examine the case in which the relationship Z = f(X1,..., Xk) holds, where each Xi has a continuous prior pdf and we wish to ..."
Abstract
 Add to MetaCart
Deterministic and neardeterministic relationships among subsets of random variables in multivariate systems are known to cause serious problems for Monte Carlo algorithms. We examine the case in which the relationship Z = f(X1,..., Xk) holds, where each Xi has a continuous prior pdf and we wish to obtain samples from the conditional distribution P (X1,..., Xk  Z = s). When f is addition, the problem is NPhard even when the Xi are independent. In more restricted cases—for example, i.i.d. Boolean or categorical Xi—efficient exact samplers have been obtained previously. For the general continuous case, we propose a dynamic scaling algorithm (DYSC), and prove that it has O(k) expected running time and finite variance. We discuss generalizations of DYSC to functions f described by binary operation trees. We evaluate the algorithm on several examples. 1