Results 1  10
of
12
Crossfertilizing strategies for better EM mountain climbing and DA field exploration: A graphical guide book
, 2009
"... In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorith ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliaryvariable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models
An Incomplete Data Approach to the Ecological Inference Problem
, 2005
"... In this paper, we propose to formulate ecological inference as a coarse data problem where only a subset of the completedata sample space is observed. Applying the related assumptions and theoretical results of Heitjan and Rubin (1991), we formally identify three key factors that affect ecological ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we propose to formulate ecological inference as a coarse data problem where only a subset of the completedata sample space is observed. Applying the related assumptions and theoretical results of Heitjan and Rubin (1991), we formally identify three key factors that affect ecological inference; distributional, contextual and aggregation effects. Different modeling strategies are discussed to deal with distributional and contextual effects. While aggregation effects cannot be statistically adjusted, we show how to formally quantify the magnitude of such effects through the use of the ExpectationMaximization algorithm. The paper concludes with simulations and empirical applications that assess the performance of the proposed models. Ccode used to implement the proposed method is available with easytouse R interface.
On the EM algorithm for overdispersed count data
, 1997
"... In this paper, we consider the use of the EM algorithm for the fitting of distributions by maximum likelihood to overdispersed count data. In the course of this, we also provide a review of various approaches that have been proposed for the analysis of such data. As the Poisson and binomial regressi ..."
Abstract
 Add to MetaCart
In this paper, we consider the use of the EM algorithm for the fitting of distributions by maximum likelihood to overdispersed count data. In the course of this, we also provide a review of various approaches that have been proposed for the analysis of such data. As the Poisson and binomial regression models, which are often adopted in the first instance for these analyses, are particular examples of a generalized linear model (GLM), the focus of the account is on the modifications and extensions to GLMs for the handling of overdispersed count data.
Incremental update; Length of stay; Machine learning
"... An incremental EMbased learning approach for online prediction of hospital resource utilization ..."
Abstract
 Add to MetaCart
(Show Context)
An incremental EMbased learning approach for online prediction of hospital resource utilization
STATISTICS IN MEDICINE
"... Modelling the distribution of ischaemic strokespeci c survival time using an EMbased mixture approach with random e ects adjustment ..."
Abstract
 Add to MetaCart
(Show Context)
Modelling the distribution of ischaemic strokespeci c survival time using an EMbased mixture approach with random e ects adjustment
Bayesian and Likelihood Inference for 2 3 2 Ecological Tables: An IncompleteData Approach
"... Ecological inference is a statistical problem where aggregatelevel data are used to make inferences about individuallevel behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 2 ecological tables by applying the general statistical frame ..."
Abstract
 Add to MetaCart
Ecological inference is a statistical problem where aggregatelevel data are used to make inferences about individuallevel behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior, which relaxes common parametric assumptions. We also identify the statistical adjustments necessary to account for contextual effects. Finally, although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easytouse R interface, is publicly available for implementing our proposed methods (Imai, Lu, and Strauss, forthcoming). Authors ’ note: This article is in the part based on two working papers by Imai and Lu, ‘‘Parametric and Nonparamateric
Genetic Architecture of Local Adaptation in Lunar and Diurnal Emergence Times of the Marine Midge Clunio marinus (Chironomidae, Diptera)
, 2012
"... Circadian rhythms preadapt the physiology of most organisms to predictable daily changes in the environment. Some marine organisms also show endogenous circalunar rhythms. The genetic basis of the circalunar clock and its interaction with the circadian clock is unknown. Both clocks can be studied i ..."
Abstract
 Add to MetaCart
(Show Context)
Circadian rhythms preadapt the physiology of most organisms to predictable daily changes in the environment. Some marine organisms also show endogenous circalunar rhythms. The genetic basis of the circalunar clock and its interaction with the circadian clock is unknown. Both clocks can be studied in the marine midge Clunio marinus (Chironomidae, Diptera), as different populations have different local adaptations in their lunar and diurnal rhythms of adult emergence, which can be analyzed by crossing experiments. We investigated the genetic basis of population variation in clock properties by constructing the first genetic linkage map for this species, and performing quantitative trait locus (QTL) analysis on variation in both lunar and diurnal timing. The genome has a genetic length of 167–193 centimorgans based on a linkage map using 344 markers, and a physical size of 95–140 megabases estimated by flow cytometry. Mapping the sex determining locus shows that females are the heterogametic sex, unlike most other Chironomidae. We identified two QTL each for lunar emergence time and diurnal emergence time. The distribution of QTL confirms a previously hypothesized genetic basis to a correlation of lunar and diurnal emergence times in natural populations. Mapping of clock genes and light receptors identified ciliary opsin 2 (cOps2) as a candidate to be involved in both lunar and diurnal timing; cryptochrome 1 (cry1) as a candidate gene for lunar timing; and two timeless (tim2, tim3) genes as candidate genes for diurnal timing. This QTL analysis of lunar rhythmicity, the first in any species, provides a unique entree into the molecular analysis of the lunar
Reduction
, 2014
"... Abstract The problem of approximating highdimensional data with a lowdimensional representation is addressed. The article makes the following contributions. An inverse regression framework is proposed, which exchanges the roles of input and response, such that the lowdimensional variable becomes ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The problem of approximating highdimensional data with a lowdimensional representation is addressed. The article makes the following contributions. An inverse regression framework is proposed, which exchanges the roles of input and response, such that the lowdimensional variable becomes the regressor, and which is tractable. A mixture of locallylinear probabilistic mapping model is introduced, that starts with estimating the parameters of the inverse regression, and follows with inferring closedform solutions for the forward parameters of the highdimensional regression problem of interest. Moreover, a partiallylatent paradigm is introduced, such that the vectorvalued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latentvariable augmentation of regression. Expectationmaximization (EM) procedures are introduced, based on a data augmentation strategy which facilitates the maximumlikelihood search over the model parameters. Two augmentation schemes are proposed and the associated EM inference procedures are described in detail; they may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. Experimental evidence is provided that the method outperforms several existing regression techniques.