Results 1  10
of
157
Classification by pairwise coupling
, 1998
"... We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estim ..."
Abstract

Cited by 277 (0 self)
 Add to MetaCart
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated datasets. Classifiers used include linear discriminants, nearest neighbors, and the support vector machine.
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
Abstract

Cited by 265 (46 self)
 Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
The State of Record Linkage and Current Research Problems
 Statistical Research Division, U.S. Census Bureau
, 1999
"... This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful id ..."
Abstract

Cited by 218 (7 self)
 Add to MetaCart
This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin. Keywords: computer matching, modeling, iterative fitting, string comparison, optimization RsSUMs Cet article donne une vue d'ensemble sur les ...
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 183 (15 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
An algebra for probabilistic databases
"... An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly prob ..."
Abstract

Cited by 128 (1 self)
 Add to MetaCart
An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.
Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Models
, 1993
"... Ways of obtaining approximate Bayes factors for generalized linear models are described, based on the Laplace method for integrals. I propose a new approximation which uses only the output of standard computer programs such as GUM; this appears to be quite accurate. A reference set of proper priors ..."
Abstract

Cited by 98 (28 self)
 Add to MetaCart
Ways of obtaining approximate Bayes factors for generalized linear models are described, based on the Laplace method for integrals. I propose a new approximation which uses only the output of standard computer programs such as GUM; this appears to be quite accurate. A reference set of proper priors is suggested, both to represent the situation where there is not much prior information, and to assess the sensitivity of the results to the prior distribution. The methods can be used when the dispersion parameter is unknown, when there is overdispersion, to compare link functions, and to compare error distributions and variance functions. The methods can be used to implement the Bayesian approach to accounting for model uncertainty. I describe an application to inference about relative risks in the presence of control factors where model uncertainty is large and important. Software to implement the
How Many Iterations in the Gibbs Sampler?
 In Bayesian Statistics 4
, 1992
"... When the Gibbs sampler is used to estimate posterior distributions (Gelfand and Smith, 1990), the question of how many iterations are required is central to its implementation. When interest focuses on quantiles of functionals of the posterior distribution, we describe an easilyimplemented metho ..."
Abstract

Cited by 97 (5 self)
 Add to MetaCart
When the Gibbs sampler is used to estimate posterior distributions (Gelfand and Smith, 1990), the question of how many iterations are required is central to its implementation. When interest focuses on quantiles of functionals of the posterior distribution, we describe an easilyimplemented method for determining the total number of iterations required, and also the number of initial iterations that should be discarded to allow for "burnin". The method uses only the Gibbs iterates themselves, and does not, for example, require external specification of characteristics of the posterior density. Here the method is described for the situation where one long run is generated, but it can also be easily applied if there are several runs from different starting points. It also applies more generally to Markov chain Monte Carlo schemes other than the Gibbs sampler. It can also be used when several quantiles are to be estimated, when the quantities of interest are probabilities rath...
Matching and Record Linkage
 Business Survey Methods
, 1995
"... INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters b ..."
Abstract

Cited by 92 (15 self)
 Add to MetaCart
INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters by Colledge, Nijhowne, and Archer). A survey of retail establishments or agricultural establishments might combine results from an area frame and a list frame. To produce a combined estimator, units from the area frame would need to be identified in the list frame (see VogelKott chapter). To estimate the size of a (sub)population via capturerecapture techniques, one needs to accurately determine units common to two or more independent listings (Sekar and Deming 1949; Scheuren 1983; Winkler 1989b). Samples must be drawn appropriately to estimate overlap (Deming and Gleser 1959). Rather than develop a special survey to collect data for policy decisions, it might be more appropriate t
Toward a method of selecting among computational models of cognition
 Psychological Review
, 2002
"... The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to ..."
Abstract

Cited by 75 (4 self)
 Add to MetaCart
The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to guide the evaluation and selection of these models. This article introduces a method of selecting among mathematical models of cognition known as minimum description length, which provides an intuitive and theoretically wellgrounded understanding of why one model should be chosen. A central but elusive concept in model selection, complexity, can also be derived with the method. The adequacy of the method is demonstrated in 3 areas of cognitive modeling: psychophysics, information integration, and categorization. How should one choose among competing theoretical explanations of data? This question is at the heart of the scientific enterprise, regardless of whether verbal models are being tested in an experimental setting or computational models are being evaluated in simulations. A number of criteria have been proposed to assist in this endeavor, summarized nicely by Jacobs and Grainger