Results 1 - 10
of
14
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection o ..."
Abstract
-
Cited by 213 (42 self)
- Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
Algebraic Algorithms for Sampling from Conditional Distributions
- Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract
-
Cited by 152 (12 self)
- Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
Preliminaries to a Theory of Speech Disfluencies
, 1994
"... This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. ..."
Abstract
-
Cited by 97 (7 self)
- Add to MetaCart
This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications. The method includes analysis of over 5000 hand-annotated disfluencies from a database (250,000 words) containing three different styles of spontaneous speech: task-oriented human-computer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics ("features") ...
Split models for contingency tables
, 2003
"... A framework for log-linear models with context specific independence structures, i.e. conditional independencies holding only for specific values of the conditioning variables is introduced. This framework is constituted by the class of split models. Also a software package named YGGDRASIL which is ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A framework for log-linear models with context specific independence structures, i.e. conditional independencies holding only for specific values of the conditioning variables is introduced. This framework is constituted by the class of split models. Also a software package named YGGDRASIL which is designed for statistical inference in split models is presented. Split models are an extension of graphical models for contingency tables. The treatment of split models includes estimation, representation and a Markov property for reading off independencies holding in a specific context. Two examples, including an illustration of the use of YGGDRASIL are
Three Centuries of Categorical Data Analysis: Log-linear Models and Maximum Likelihood Estimation
"... The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development o ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development of log-linear models and their estimation via the method of maximum likelihood. S. N. Roy played a crucial role in this development with two papers co-authored with his students S. K. Mitra and Marvin Kastenbaum, at roughly the mid-point temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for loglinear models. Understanding the problem of non-existence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem. 1
poLCA: Polytomous Variable Latent Class Analysis. R Package Version 1.1. http://userwww.service.emory.edu/~dlinzer/poLCA
, 2007
"... poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment. Both models can be called using a single simple command line. The basic latent class model is a finite mixture m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment. Both models can be called using a single simple command line. The basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. The latent class regression model further enables the researcher to estimate the effects of covariates on predicting latent class membership. poLCA uses expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the model parameters. This user’s guide to the poLCA software package draws extensively from Linzer and Lewis (Forthcoming). 1 1 Quick Start This section is provided for users who wish to skip the technical details and proceed directly to the estimation of latent class and latent class regression models.
LINEAR MODELS ANALYSIS OF INCOMPLETE MULTIVARIATE CATEGORICAL DATA
, 1972
"... This research deals with experiments or surveys producing multivariate categorical data which is incomplete, in the sense that not all variables of interest are measured on every subject or element of the sample. For the most part, incompleteness is taken to arise by design, rather than by random fa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This research deals with experiments or surveys producing multivariate categorical data which is incomplete, in the sense that not all variables of interest are measured on every subject or element of the sample. For the most part, incompleteness is taken to arise by design, rather than by random failure of the measurement process. In these circumstances, one can often assume that counts derived from appropriate disjoint subsets of the data arise from independent multinomial distributions with linearly related parameters. Best asymptotically normal oJ estimates of these parameters may be determined by maximizing the likelihood of the observations or by minimizing Pearson's-x 2, Neyman's X~,
Sequences of regressions and their independences
, 2012
"... Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Ordered sequences of univariate or multivariate regressions provide statistical modelsfor analysingdata fromrandomized, possiblysequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence anddiscussthenotion of simpler statistical covering models. Knowledgeof Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.
Office: Griffin-Floyd 204
"... reserve for this course at the Science library) This course surveys methods for the analysis of categorical response variables. The main subject areas covered are descriptive and inferential statistics for two-way and three-way contingency tables, generalized linear models for discrete responses, bi ..."
Abstract
- Add to MetaCart
reserve for this course at the Science library) This course surveys methods for the analysis of categorical response variables. The main subject areas covered are descriptive and inferential statistics for two-way and three-way contingency tables, generalized linear models for discrete responses, binary regression models (emphasizing logistic regression), models for multi-category responses, loglinear models for contingency tables, matched pairs, and maximum likelihood inference for categorical response data.

