Results 1  10
of
36
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 182 (15 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
Computing Maximum Likelihood Estimates in loglinear models
, 2006
"... We develop computational strategies for extended maximum likelihood estimation, as defined in Rinaldo (2006), for general classes of loglinear models of widespred use, under Poisson and productmultinomial sampling schemes. We derive numerically efficient procedures for generating and manipulating ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We develop computational strategies for extended maximum likelihood estimation, as defined in Rinaldo (2006), for general classes of loglinear models of widespred use, under Poisson and productmultinomial sampling schemes. We derive numerically efficient procedures for generating and manipulating design matrices and we propose various algorithms for computing the extended maximum likelihood estimates of the expectations of the cell counts. These algorithms allow to identify the set of estimable cell means for any given observable table and can be used for modifying traditional goodnessoffit tests to accommodate for a nonexistent MLE. We describe and take advantage of the connections between extended maximum likelihood
Three Centuries of Categorical Data Analysis: Loglinear Models and Maximum Likelihood Estimation
"... The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development o ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development of loglinear models and their estimation via the method of maximum likelihood. S. N. Roy played a crucial role in this development with two papers coauthored with his students S. K. Mitra and Marvin Kastenbaum, at roughly the midpoint temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for loglinear models. Understanding the problem of nonexistence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem. 1
Sequences of regressions and their independences
, 2012
"... Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multiwave panel studies, but also from crosssectional or retrospective studies. Conditional independences are captured by what we ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multiwave panel studies, but also from crosssectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence anddiscussthenotion of simpler statistical covering models. Knowledgeof Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.
Sparse Contingency Tables and HighDimensional LogLinealr Models for Alternative Splicing
 in FullLength cDNA Libraries, Research Report 132, Swiss Federal Institute of Technology
, 2006
"... Corinne Dahinden is PhD student at the Seminar für Statistik, ETH Zürich, CH8092 Zürich, ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Corinne Dahinden is PhD student at the Seminar für Statistik, ETH Zürich, CH8092 Zürich,
On the Index of Dissimilarity for Lack of Fit in Log Linear Models
"... The index of dissimilarity, often denoted by Delta, is commonly used, especially in social science and with large datasets, to describe the lack of fit of models for categorical data. In this paper the definition and sampling properties of the index are investigated for general loglinear models. It ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The index of dissimilarity, often denoted by Delta, is commonly used, especially in social science and with large datasets, to describe the lack of fit of models for categorical data. In this paper the definition and sampling properties of the index are investigated for general loglinear models. It is argued that in some applications a standardized version of the index is appropriate for interpretation. A simple, approximate variance formula is derived for the index, whether standardized or not. A simple bias reduction formula is also given. The accuracy of these formulae and of confidence intervals based upon them is investigated in a simulation study based on largescale social mobility data. Key words: bias reduction; dissimilarity index; extended hypergeometric; folded normal; iterative proportional fitting; iterative scaling; stratified sampling. 1
LINEAR MODELS ANALYSIS OF INCOMPLETE MULTIVARIATE CATEGORICAL DATA
, 1972
"... This research deals with experiments or surveys producing multivariate categorical data which is incomplete, in the sense that not all variables of interest are measured on every subject or element of the sample. For the most part, incompleteness is taken to arise by design, rather than by random fa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This research deals with experiments or surveys producing multivariate categorical data which is incomplete, in the sense that not all variables of interest are measured on every subject or element of the sample. For the most part, incompleteness is taken to arise by design, rather than by random failure of the measurement process. In these circumstances, one can often assume that counts derived from appropriate disjoint subsets of the data arise from independent multinomial distributions with linearly related parameters. Best asymptotically normal oJ estimates of these parameters may be determined by maximizing the likelihood of the observations or by minimizing Pearson'sx 2, Neyman's X~,
Data Engineering
"... A growing number of applications need access to video data stored in digital form on secondary storage devices (e.g., videoondemand, multimedia messaging). As a result, video servers that are responsible for the storage and retrieval, at fixed rates, of hundreds of videos from disks are becoming i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A growing number of applications need access to video data stored in digital form on secondary storage devices (e.g., videoondemand, multimedia messaging). As a result, video servers that are responsible for the storage and retrieval, at fixed rates, of hundreds of videos from disks are becoming increasingly important. Since video data tends to be voluminous, several disks are usually used in order to store the videos. A challenge is to devise schemes for the storage and retrieval of videos that distribute the workload evenly across disks, reduce the cost of the server and at the same time, provide good response times to client requests for video data. In this paper, we present schemes that are based on striping videos (finegrained as well as coarsegrained) across disks in order to effectively utilize disk bandwidth. For the schemes, we show how an optimalcost server architecture can be determined if data for a certain prespecified number of videos is to be concurrently retrieved...