Discrete Multivariate Analysis: Theory and Practice
, 1975
"... the collaboration of Richard J. Light and Frederick Mosteller. ..."
the collaboration of Richard J. Light and Frederick Mosteller.
Computing Maximum Likelihood Estimates in loglinear models
, 2006
"... We develop computational strategies for extended maximum likelihood estimation, as defined in Rinaldo (2006), for general classes of loglinear models of widespred use, under Poisson and productmultinomial sampling schemes. We derive numerically efficient procedures for generating and manipulating ..."
We develop computational strategies for extended maximum likelihood estimation, as defined in Rinaldo (2006), for general classes of loglinear models of widespred use, under Poisson and productmultinomial sampling schemes. We derive numerically efficient procedures for generating and manipulating design matrices and we propose various algorithms for computing the extended maximum likelihood estimates of the expectations of the cell counts. These algorithms allow to identify the set of estimable cell means for any given observable table and can be used for modifying traditional goodnessoffit tests to accommodate for a nonexistent MLE. We describe and take advantage of the connections between extended maximum likelihood
Three Centuries of Categorical Data Analysis: Loglinear Models and Maximum Likelihood Estimation
"... The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development o ..."
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development of loglinear models and their estimation via the method of maximum likelihood. S. N. Roy played a crucial role in this development with two papers coauthored with his students S. K. Mitra and Marvin Kastenbaum, at roughly the midpoint temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for loglinear models. Understanding the problem of nonexistence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem. 1
DistributionFree Multivariate Process Control Based On LogLinear Modeling
"... This paper considers statistical process control (SPC) when the process measurement is multivariate. In the literature, most existing multivariate SPC procedures assume that the incontrol distribution of the multivariate process measurement is known and it is a Gaussian distribution. In application ..."
This paper considers statistical process control (SPC) when the process measurement is multivariate. In the literature, most existing multivariate SPC procedures assume that the incontrol distribution of the multivariate process measurement is known and it is a Gaussian distribution. In applications, however, the measurement distribution is usually unknown and it needs to be estimated from data. Furthermore, multivariate measurements often do not follow a Gaussian distribution (e.g., cases when some measurement components are discrete). We demonstrate that results from conventional multivariate SPC procedures are usually unreliable when the data are nonGaussian. Existing statistical tools for describing multivariate nonGaussian data, or, transforming the multivariate nonGaussian data to multivariate Gaussian data are limited, making appropriate multivariate SPC difficult in such cases. In this paper, we suggest a methodology for estimating the incontrol multivariate measurement distribution when a set of incontrol data is available, which is based on loglinear modeling and which takes into account the association structure among the measurement components. Based on this estimated incontrol distribution, a multivariate CUSUM procedure for detecting shifts in the location parameter vector of the measurement distribution is also suggested for Phase II SPC. This procedure does not depend on the Gaussian distribution assumption; thus, it is appropriate to use for most multivariate SPC problems.
Univariate and Bivariate Loglinear Models for Discrete Test Score Distributions
, 2000
"... The welldeveloped theory of exponential families of distributions is applied to the problem of fitting the univariate histograms and discrete bivariate frequency distributions that often arise in the analysis of test scores. These models are powerful tools for many forms of parametric data smoothi ..."
The welldeveloped theory of exponential families of distributions is applied to the problem of fitting the univariate histograms and discrete bivariate frequency distributions that often arise in the analysis of test scores. These models are powerful tools for many forms of parametric data smoothing and are particularly wellsuited to problems in which there is little or no theory to guide a choice of probability models, e.g., smoothing a distribution to eliminate roughness and zero frequencies in order to equate scores from different tests. Attention is given to efficient computation of the maximum likelihood estimates of the parameters using Newton's Method and to computationally efficient methods for obtaining the asymptotic standard errors of the fitted frequencies and proportions. We discuss tools that can be used to diagnose the quality of the fitted frequencies for both the univariate and the bivariate cases. Five examples, using real data, are used to illustrate the methods of this paper.
The organizers’ ecology: An empirical study of foreign banks
 in Shanghai. Org. Sci
, 2006
Statistical Methods for Hazards and Health
"... The objective of this article is to document the need for further development of statistical methodology, training of more statisticians and improved communication between statisticians and the many other disciplines engaged in environmental research. Discussion of adequacy of the current statistica ..."
The objective of this article is to document the need for further development of statistical methodology, training of more statisticians and improved communication between statisticians and the many other disciplines engaged in environmental research. Discussion of adequacy of the current statistical methodology requires the use of examples, which will hopefully not be offensive to the authors. Reference is made to recent developments and areas of unsolved problems delineated in three broad areas: enumeration data and adjusted rates; time series; and multiple regression. A brief outline of the ideas behind current methods of analyzing discrete data is followed by a demonstration of their utility using an example of the effects of exposure, sex, and education on bronchitis rates. Examples are listed of the ubiquity of the time component when relating pollution effects to each other and to health effects. An artificial example is used to emphasize the effects of timedependent autocorrelations, trends, and cycles. References are given to a variety of new developments in timeseries analysis. Discussion of the pitfalls in multiple regression analysis, and possible alternative approaches is largely based on two recent reviews and includes references to recent developments of robust techniques.
Keying Ye
, 1999
"... (ABSTRACT) Random variables defined on the natural numbers may often be approximated by Poisson variables. Just as normal approximations may be improved by saddlepoint methods, Poisson approximations may be substantially improved by tilting, expansion, and other related methods. This work will devel ..."
(ABSTRACT) Random variables defined on the natural numbers may often be approximated by Poisson variables. Just as normal approximations may be improved by saddlepoint methods, Poisson approximations may be substantially improved by tilting, expansion, and other related methods. This work will develop and examine the use of these methods, as well as present
, 2010
"... Pseudoscore confidence intervals for parameters in discrete statistical models ..."
Pseudoscore confidence intervals for parameters in discrete statistical models