Bias Properties of Bayesian Statistics in Finite Mixture of Negative Binomial Regression Models in Crash Data Analysis
 Accident Analysis and Prevention
, 2010
Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about features of the population under study. Despite its usefulness, however, no study has been found to examine the performance of this finite mixture under various conditions of sample sizes and samplemean values that are common in crash data analysis. This study investigated the bias associated with the Bayesian summary statistics (posterior mean and median) of dispersion parameters in the twocomponent finite mixture of negative binomial regression models. A simulation study was conducted using various sample sizes under different samplemean values. Two prior specifications (noninformative and weaklyinformative) on the dispersion parameter were also compared. The results showed that the posterior mean using the noninformative prior exhibited a high bias for the dispersion parameter and should be avoided when the dataset contains less than 2,000 observations (even for high sample values). The
Sensitivity in risk analyses with uncertain numbers
, 2006
Sensitivity analysis is a study of how changes in the inputs to a model influence the results of the model. Many techniques have recently been proposed for use when the model is probabilistic. This report considers the related problem of sensitivity analysis when the model includes uncertain numbers that can involve both aleatory and epistemic uncertainty and the method of calculation is DempsterShafer evidence theory or probability bounds analysis. Some traditional methods for sensitivity analysis generalize directly for use with uncertain numbers, but, in some respects, sensitivity analysis for these analyses differs from traditional deterministic or probabilistic sensitivity analyses. A case study of a dike reliability assessment illustrates several methods of sensitivity analysis, including traditional probabilistic assessment, local derivatives, and a “pinching ” strategy that hypothetically reduces the epistemic uncertainty or aleatory uncertainty, or both, in an input variable to estimate the reduction of uncertainty in the outputs. The prospects for applying the methods to black box models are also considered. 3
Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Manuscript submitted for publication
, 2010
Short title: Was/were as a case study for statistical practice
BOOTSTRAPPING FINITE MIXTURE MODELS
 COMPSTAT’2004 SYMPOSIUM
, 2004
Finite mixture regression models are used for modelling unobserved heterogeneity in the population. However, depending on the specifications these models need not be identifiable, which is especially of concern if the parameters are interpreted. As bootstrap methods are already used as a diagnostic tool for linear regression models, we investigate their use for finite mixture models. We show that bootstrapping helps in revealing identifiability problems and that parametric bootstrapping can be used for analyzing the reliability of coefficient estimates.
Dynamic behaviour of connectionist speech recognition with strong latency constraints
 Speech Comm
This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolution model learnt by the multilayer perceptrons and the transition model imposed by the Viterbi decoder, in different latency conditions. Two experiments were conducted in which the time dependencies in the language model (LM) were controlled by a parameter. The results show a strong interaction between the three factors involved, namely the neural network topology, the length of time dependencies in the LM and the decoder latency. Key words: speech recognition, neural network, low latency, nonlinear dynamics 1
Estimating a bivariate density when there are extra data on one or both components
 Biometrika
, 2006
ABSTRACT. Assume we have a dataset, Z say, from the joint distribution of random variables X and Y, and two further, independent datasets, X and Y, from the marginal distributions of X and Y, respectively. We wish to combine X, Y and Z, so as to construct an estimator of the joint density. This problem is readily solved in some parametric circumstances. For example, if the joint distribution were normal then we would combine data from X and Z to estimate the mean and variance of X; proceed analogously to estimate the mean and variance of Y; but use data from Z alone to estimate E(XY). However, the problem is more difficult in a nonparametric setting. There we suggest a copulabased solution, which has potential benefits even when the marginal datasets X and Y are empty. For example, if the copula density is sufficiently smooth in the region where we wish to estimate it, then the effective dimension of the structure that links the marginal distributions is relatively low, and the joint density of X and Y can be estimated with a high degree of accuracy. Similar improvements in performance are available if the marginals are close to being independent. We suggest using wavelet estimators to approximate the copula density, which in cases of statistical interest can be unbounded along boundaries. Our techniques are also useful for solving recentlyconsidered related problems, for example where the marginal distributions are determined by parametric models. Therefore the methodology has application beyond the context which motivated it. The methodology is also readily extended to more general multivariate settings.
SENSITIVITY OF INFERENCES IN FORENSIC GENETICS TO ASSUMPTIONS ABOUT FOUNDING GENES
, 2009
Many forensic genetics problems can be handled using structured systems of discrete variables, for which Bayesian networks offer an appealing practical modeling framework, and allow inferences to be computed by probability propagation methods. However, when standard assumptions are violated—for example, when allele frequencies are unknown, there is identity by descent or the population is heterogeneous—dependence is generated among founding genes, that makes exact calculation of conditional probabilities by propagation methods less straightforward. Here we illustrate different methodologies for assessing sensitivity to assumptions about founders in forensic genetics problems. These include constrained steepest descent, linear fractional programming and representing dependence by structure. We illustrate these methods on several forensic genetics examples involving criminal identification, simple and complex disputed paternity and DNA mixtures.
Increasing Informativeness in Temporal Annotation
In this paper, we discuss some of the challenges of adequately applying a specification language to an annotation task, as embodied in a specific guideline. In particular, we discuss some issues with TimeML motivated by error analysis on annotated TLINKs in TimeBank. We introduce a document level information structure we call a narrative container (NC), designed to increase informativeness and accuracy of temporal relation identification. The narrative container is the default interval containing the events being discussed in the text, when no explicit temporal anchor is given. By exploiting this notion in the creation of a new temporal annotation over TimeBank, we were able to reduce inconsistencies and increase informativeness when compared to existing TLINKs in TimeBank. 1
Subpathwayminer: A Software Package for Flexible Identification of Pathways
 Nucleic Acids Res
, 2009
