Crossfertilizing strategies for better EM mountain climbing and DA field exploration: A graphical guide book
, 2009
Abstract

In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliaryvariable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models
Robust demographic inference from genomic and SNP data. PLoS Genetics 9
, 2013
Abstract

We introduce a flexible and robust simulationbased framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our compositelikelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihoodbased methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with LaLi, the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to noncoding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (.110 Ky) between Yoruba and San populations. Our methodology appears
Highdimensional regression with gaussian mixtures and partiallylatent response variables
, 1308
Abstract

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
An Incomplete Data Approach to the Ecological Inference Problem
, 2005
Abstract

In this paper, we propose to formulate ecological inference as a coarse data problem where only a subset of the completedata sample space is observed. Applying the related assumptions and theoretical results of Heitjan and Rubin (1991), we formally identify three key factors that affect ecological inference; distributional, contextual and aggregation effects. Different modeling strategies are discussed to deal with distributional and contextual effects. While aggregation effects cannot be statistically adjusted, we show how to formally quantify the magnitude of such effects through the use of the ExpectationMaximization algorithm. The paper concludes with simulations and empirical applications that assess the performance of the proposed models. Ccode used to implement the proposed method is available with easytouse R interface.
Genetic Architecture of Local Adaptation in Lunar and Diurnal Emergence Times of the Marine Midge Clunio marinus (Chironomidae, Diptera)
, 2012
Abstract

Circadian rhythms preadapt the physiology of most organisms to predictable daily changes in the environment. Some marine organisms also show endogenous circalunar rhythms. The genetic basis of the circalunar clock and its interaction with the circadian clock is unknown. Both clocks can be studied in the marine midge Clunio marinus (Chironomidae, Diptera), as different populations have different local adaptations in their lunar and diurnal rhythms of adult emergence, which can be analyzed by crossing experiments. We investigated the genetic basis of population variation in clock properties by constructing the first genetic linkage map for this species, and performing quantitative trait locus (QTL) analysis on variation in both lunar and diurnal timing. The genome has a genetic length of 167–193 centimorgans based on a linkage map using 344 markers, and a physical size of 95–140 megabases estimated by flow cytometry. Mapping the sex determining locus shows that females are the heterogametic sex, unlike most other Chironomidae. We identified two QTL each for lunar emergence time and diurnal emergence time. The distribution of QTL confirms a previously hypothesized genetic basis to a correlation of lunar and diurnal emergence times in natural populations. Mapping of clock genes and light receptors identified ciliary opsin 2 (cOps2) as a candidate to be involved in both lunar and diurnal timing; cryptochrome 1 (cry1) as a candidate gene for lunar timing; and two timeless (tim2, tim3) genes as candidate genes for diurnal timing. This QTL analysis of lunar rhythmicity, the first in any species, provides a unique entree into the molecular analysis of the lunar
ality Reduction
Abstract
Abstract The problem of approximating highdimensional data with a lowdimensional representation is addressed. The article makes the following contributions. An inverse regression framework is proposed, which exchanges the roles of input and response, such that the lowdimensional variable becomes the regressor, and which is tractable. A mixture of locallylinear probabilistic mapping model is introduced, that starts with estimating the parameters of the inverse regression, and follows with inferring closedform solutions for the forward parameters of the highdimensional regression problem of interest. Moreover, a partiallylatent paradigm is introduced, such that the vectorvalued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latentvariable augmentation of regression. Expectationmaximization (EM) procedures are introduced, based on a data augmentation strategy which facilitates the maximumlikelihood search over the model parameters. Two augmentation schemes are proposed and the associated EM inference procedures are described in detail; they may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. Experimental evidence is provided that the method outperforms several existing regression techniques.
STATISTICS IN MEDICINE
Abstract
Modelling the distribution of ischaemic strokespeci c survival time using an EMbased mixture approach with random e ects adjustment
Incremental update; Length of stay; Machine learning
Abstract
An incremental EMbased learning approach for online prediction of hospital resource utilization