Results 1  10
of
33
Bandwidth Selection in Kernel Density Estimation: A Review
 CORE and Institut de Statistique
"... Allthough nonparametric kernel density estimation is nowadays a standard technique in explorative dataanalysis, there is still a big dispute on how to assess the quality of the estimate and which choice of bandwidth is optimal. The main argument is on whether one should use the Integrated Squared ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Allthough nonparametric kernel density estimation is nowadays a standard technique in explorative dataanalysis, there is still a big dispute on how to assess the quality of the estimate and which choice of bandwidth is optimal. The main argument is on whether one should use the Integrated Squared Error or the Mean Integrated Squared Error to define the optimal bandwidth. In the last years a lot of research was done to develop bandwidth selection methods which try to estimate the optimal bandwidth obtained by either of this error criterion. This paper summarizes the most important arguments for each criterion and gives an overview over the existing bandwidth selection methods. We also summarize the small sample behaviour of these methods as assessed in several MonteCarlo studies. These MonteCarlo studies are all restricted to very small sample sizes due to the fact that the numerical effort of estimating the optimal bandwidth by any of these bandwidth selection methods is proporti...
Estimating Bayes Factors via Posterior Simulation with the LaplaceMetropolis Estimator
 Journal of the American Statistical Association
, 1994
"... The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. W ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
The key quantity needed for Bayesian hypothesis testing and model selection is the marginal likelihood for a model, also known as the integrated likelihood, or the marginal probability of the data. In this paper we describe a way to use posterior simulation output to estimate marginal likelihoods. We describe the basic LaplaceMetropolis estimator for models without random effects. For models with random effects the compound LaplaceMetropolis estimator is introduced. This estimator is applied to data from the World Fertility Survey and shown to give accurate results. Batching of simulation output is used to assess the uncertainty involved in using the compound LaplaceMetropolis estimator. The method allows us to test for the effects of independent variables in a random effects model, and also to test for the presence of the random effects. KEY WORDS: LaplaceMetropolis estimator; Random effects models; Marginal likelihoods; Posterior simulation; World Fertility Survey. 1 Introduction...
CrossValidation of Multivariate Densities
 Journal of the American Statistical Association
, 1992
"... : In recent years, the focus of study in smoothing parameter selection for kernel density estimation has been on the univariate case, while multivariate kernel density estimation has been largely neglected. In part, this may be due to the perception that calibrating multivariate densities is substan ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
: In recent years, the focus of study in smoothing parameter selection for kernel density estimation has been on the univariate case, while multivariate kernel density estimation has been largely neglected. In part, this may be due to the perception that calibrating multivariate densities is substantially more difficult. In this paper, we explicitly derive and compare multivariate versions of the bootstrap method of Taylor (1989), the leastsquares crossvalidation method developed by Bowman (1984) and Rudemo (1982), and a biased crossvalidation method similar to that of Scott and Terrell (1987) for multivariate kernel estimation using the product kernel estimator. The theoretical behavior of these crossvalidation algorithms is shown to improve (surprisingly) as the dimension increases, approaching the best rate of O(n \Gamma1=2 ). Simulation studies suggest that the new biased crossvalidation method performs quite well and with reasonable variability as compared to the other two ...
Inference for Deterministic Simulation Models: The Bayesian Melding Approach
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Deterministic simulation models are used in many areas of science, engineering and policymaking. Typically, they are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many userspecified inputs. The inputs are often specified by some form of trialan ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
Deterministic simulation models are used in many areas of science, engineering and policymaking. Typically, they are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many userspecified inputs. The inputs are often specified by some form of trialanderror approach in which plausible values are postulated, the corresponding outputs inspected, and the inputs modified until plausible outputs are obtained. Here we address the issue of more formal inference for such models. Raftery et al. (1995a) proposed the Bayesian synthesis approach in which the available information about both inputs and outputs was encoded in a probability distribution and inference was made by restricting this distribution to the submanifold specifid by the model. Wolpert (1995) showed that this is subject to the Borel paradox, according to which the results can depend on the parameterization of the model. We show that this dependence is due to the presence of a prior on the outputs. We propose a modified approach, called Bayesian melding, which takes full account of information and uncertainty about both inputs and outputs to the model, while avoiding the Borel paradox. This is done by recognizing the existence of two priors, one implicit and one explicit, on each input and output � these are combined via logarithmic pooling. Bayesian melding is then
Hypothesis Testing and Model Selection Via Posterior Simulation
 In Practical Markov Chain
, 1995
"... Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized tha ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallicities. These hypotheses have different implications for theories of the formation of the Galaxy. Some of the evidence for deciding whether there are two or three populations is shown in Figure 1, which shows radial and rotational velocities for n = 2; 370 stars. A natural model for this situation is a mixture model with J components, namely y i = J X j=1 ae j
Bowhead whale, Balaena mysticetus, population size estimated from acoustic and visual census data collected near
, 1994
"... Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names app ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Commission). We are very grateful to Andrew A. Scha ner for excellent research assistance. We thank Dr. Thomas F. Albert, Craig George and other scientists and personnel from the Borough's Department of Wildlife Management, the many other researchers with whom we have worked, most of whose names appear on papers in our reference list, the census crew, and the Eskimo hunters of the Borough, for their contributions to our understanding of bowhead whales and the census. We are also grateful to Geof Givens for useful discussions, and to Doug Butterworth and Andre Punt Estimating the population size and rate of increase of bowhead whales, Balaena mysticetus, is important because bowheads were the rst species of great whale for which commercial whaling stopped and so their status indicates the recovery prospects of other great whales, and also because this information is used by the International Whaling Commission (IWC) to set the aboriginal subsistence whaling quota for Alaskan Eskimos. We describe the 1993 visual and acoustic census o Point Barrow, Alaska, which provides the best data available for estimating these quantities. We outline the de nitive version of two statistical methods for estimating the population, the generalized removal method and the Bayes empirical Bayes
Fine: Fisher information nonparametric embedding
 IEEE Transactions on Signal Processing
"... Abstract—We consider the problems of clustering, classification, and visualization of highdimensional data when no straightforward euclidean representation exists. In this paper, we propose using the properties of information geometry and statistical manifolds in order to define similarities betwee ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
Abstract—We consider the problems of clustering, classification, and visualization of highdimensional data when no straightforward euclidean representation exists. In this paper, we propose using the properties of information geometry and statistical manifolds in order to define similarities between data sets using the Fisher information distance. We will show that this metric can be approximated using entirely nonparametric methods, as the parameterization and geometry of the manifold is generally unknown. Furthermore, by using multidimensional scaling methods, we are able to reconstruct the statistical manifold in a lowdimensional euclidean space; enabling effective learning on the data. As a whole, we refer to our framework as Fisher Information Nonparametric Embedding (FINE) and illustrate its uses on practical problems, including a biomedical application and document classification. Index Terms—Information geometry, statistical manifold, dimensionality reduction, multidimensional scaling. 1
Density estimation
 Statistical Science
, 2004
"... Abstract. This paper provides a practical description of density estimation based on kernel methods. An important aim is to encourage practicing statisticians to apply these methods to data. As such, reference is made to implementations of these methods in R, SPLUS and SAS. Key words and phrases: K ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract. This paper provides a practical description of density estimation based on kernel methods. An important aim is to encourage practicing statisticians to apply these methods to data. As such, reference is made to implementations of these methods in R, SPLUS and SAS. Key words and phrases: Kernel density estimation, bandwidth selection, local likelihood density estimates, data sharpening. 1.
Dimensionality reduction on statistical manifolds
, 2009
"... This work could not have been possible without the support of many individuals, and I would be remiss if I did not take the opportunity to thank them. To start, I give the utmost thanks to my advisor, Professor Alfred Hero. He not only took me under his wing as a research assistant, but was a major ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This work could not have been possible without the support of many individuals, and I would be remiss if I did not take the opportunity to thank them. To start, I give the utmost thanks to my advisor, Professor Alfred Hero. He not only took me under his wing as a research assistant, but was a major contributor to my professional development. While his otherworldly knowledge base was critical towards my maturation as a researcher, his motivation, mentorship, and words of advice kept me going during difficult and stressful times. I would also like to thank Professor Raviv Raich, who has worked sidebyside with me throughout my entire research experience. Whenever I came upon a road block, I knew I could count on Raviv to have the patience and wherewithal to guide me through. My ability to progress so quickly throughout this process was due in large part to my amazing research project. I owe this entirely to Dr. William Finn and the Department of Pathology at the University of Michigan, who came to us with an idea and a lot of data. Dr. Finn was always available for discussion and insight into the process of flow cytometry, and throughout my development he has shown a genuine excitement for all of the work I have done. Without his knowledge, support, and enthusiasm, none of this work would have been completed. This work has also benefited from discussions with the remainder of my committee members. A special thanks goes to Professor Elizaveta Levina and Professor Clayton Scott for their input and support. Their level of expertise in many of the areas iii directly coinciding to my research topics was very beneficial, and the thirdparty
Constrained Maximum Likelihood
, 1996
"... Constrained Maximum Likelihood (CML) is a new software module developed at Aptech Systems for the generation of maximum likelihood estimates of statistical models with general constraints on parameters. These constraints can be linear or nonlinear, equality or inequality. The software uses the Seque ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Constrained Maximum Likelihood (CML) is a new software module developed at Aptech Systems for the generation of maximum likelihood estimates of statistical models with general constraints on parameters. These constraints can be linear or nonlinear, equality or inequality. The software uses the Sequential Quadratic Programming method with various descent algorithms to iterate from a given starting point to the maximum likelihood estimates. Standard asymptotic theory asserts that statistical inference regarding inequality constrained parameters does not require special techniques because for a large enough sample there will always be a confidence region at the selected level of confidence that avoids the constraint boundaries. Sufficiently large, however, can be quite large, in the millions of cases when the true parameter values are very close to these boundaries. In practice, our finite samples may not be large enough for confidence regions to avoid constraint boundaries, and this has ...