Results 1  10
of
13
The State of Record Linkage and Current Research Problems
 Statistical Research Division, U.S. Census Bureau
, 1999
"... This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful id ..."
Abstract

Cited by 217 (7 self)
 Add to MetaCart
This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin. Keywords: computer matching, modeling, iterative fitting, string comparison, optimization RsSUMs Cet article donne une vue d'ensemble sur les ...
Posterior Predictive Assessment of Model Fitness Via Realized Discrepancies
 Statistica Sinica
, 1996
"... Abstract: This paper considers Bayesian counterparts of the classical tests for goodness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior predictive assessment, in a framework that also includes conditioning on auxiliary statistics. The B ..."
Abstract

Cited by 166 (28 self)
 Add to MetaCart
Abstract: This paper considers Bayesian counterparts of the classical tests for goodness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior predictive assessment, in a framework that also includes conditioning on auxiliary statistics. The Bayesian formulation facilitates the construction and calculation of a meaningful reference distribution not only for any (classical) statistic, but also for any parameterdependent “statistic ” or discrepancy. The latter allows us to propose the realized discrepancy assessment of model fitness, which directly measures the true discrepancy between data and the posited model, for any aspect of the model which we want to explore. The computation required for the realized discrepancy assessment is a straightforward byproduct of the posterior simulation used for the original Bayesian analysis. We illustrate with three applied examples. The first example, which serves mainly to motivate the work, illustrates the difficulty of classical tests in assessing the fitness of a Poisson model to a positron emission tomography image that is constrained to be nonnegative. The second and third examples illustrate the details of the posterior predictive approach in two problems: estimation in a model with inequality constraints on the parameters, and estimation in a mixture model. In all three examples, standard test statistics (either a χ 2 or a likelihood ratio) are not pivotal: the difficulty is not just how to compute the reference distribution for the test, but that in the classical framework no such distribution exists, independent of the unknown model parameters. Key words and phrases: Bayesian pvalue, χ 2 test, discrepancy, graphical assessment, mixture model, model criticism, posterior predictive pvalue, prior predictive
Improved decision rules in the FellegiSunter model of record linkage
 in American Statistical Association Proceedings of Survey Research Methods Section
, 1993
"... Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advance ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advances in statistical and computational theory. An ExpectationMaximization (EMH) algorithm that constrains the estimates to a convex subregion of the parameter space is given. The EMH algorithm provides probability estimates that yield better decision rules than unconstrained estimates. The algorithm is related to results of Meng and Rubin (1993) on MultiCycle ExpectationConditional Maximization algorithms and make use of results of Haberman (1977) that hold for large classes of loglinear models. Key Words: MCECM Algorithm, Latent Class, Computer Matching, Error Rate This paper provides a theory for obtaining constrained maximum likelihood estimates for latentclass, loglinear models on finite state spaces. The work is related to ExpectationMaximization (EM) algorithms by Meng and Rubin (1993) for obtaining unconstrained maximum likelihood estimates. Meng and Rubin generalized the original ideas of Dempster,
Bayesian Estimation and Testing of Structural Equation Models
 Psychometrika
, 1999
"... The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameter ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errorsinvariables model.
Assessing model mimicry using the parametric bootstrap
 Journal of Mathematical Psychology
, 2004
"... We present a general sampling procedure to quantify model mimicry, defined as the ability of a model to account for data generated by a competing model. This sampling procedure, called the parametric bootstrap crossfitting method (PBCM; cf. Williams (J. R. Statist. Soc. B 32 (1970) 350; Biometrics ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
We present a general sampling procedure to quantify model mimicry, defined as the ability of a model to account for data generated by a competing model. This sampling procedure, called the parametric bootstrap crossfitting method (PBCM; cf. Williams (J. R. Statist. Soc. B 32 (1970) 350; Biometrics 26 (1970) 23)), generates distributions of differences in goodnessoffit expected under each of the competing models. In the data informed version of the PBCM, the generating models have specific parameter values obtained by fitting the experimental data under consideration. The data informed difference distributions can be compared to the observed difference in goodnessoffit to allow a quantification of model adequacy. In the data uninformed version of the PBCM, the generating models have a relatively broad range of parameter values based on prior knowledge. Application of both the data informed and the data uninformed PBCM is illustrated with several examples. r 2003 Elsevier Inc. All rights reserved. 1.
A Bayesian approach to the selection and testing of mixture models
 Statistica Sinica
, 2001
"... Abstract: An important aspect of mixture modeling is the selection of the number of mixture components. In this paper, we discuss the Bayes factor as a selection tool. The discussion will focus on two aspects: computation of the Bayes factor and prior sensitivity. For the computation, we propose a v ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract: An important aspect of mixture modeling is the selection of the number of mixture components. In this paper, we discuss the Bayes factor as a selection tool. The discussion will focus on two aspects: computation of the Bayes factor and prior sensitivity. For the computation, we propose a variant of Chib’s estimator that accounts for the nonidentifiability of the mixture components. To reduce the prior sensitivity of the Bayes factor, we propose to extend the model with a hyperprior. We further discuss the use of posterior predictive checks for examining the fit of the model. The ideas are illustrated by means of a psychiatric diagnosis example.
Bayesian inference with probability matrix decomposition models
 Journal of Educational and Behavioral Statistics
, 2001
"... check, psychometrics Probability Mcatrix Decompositioni models mtay bve uised to model observed binary associations between two sets of elements. More specifically, to explain observed associations betweeni two elements, it is assumed that B laitent Bernoulli variables are realized for each element ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
check, psychometrics Probability Mcatrix Decompositioni models mtay bve uised to model observed binary associations between two sets of elements. More specifically, to explain observed associations betweeni two elements, it is assumed that B laitent Bernoulli variables are realized for each element and that these variables are subsequently mapped into an observed data point accordingg to a prespecijied dererministic rule. In this papet; we present a fully Bayesian analysis for the PMD model makintg use of the Gibbs sampler. 7his approach is shown to yield three dislinct advantages: (a) in addition to posterior mean1 estim'lates it yields ( / 1 o<) % posterior intervals for the parameters. (b) it allows for an investigation of kypothesi7ed indeterminacies in the model's parameters and for thle visualization of the best possible reduction oJ ' the posterior distribution in a lowdimnensional space, and (c) it allows Jfr a broad range of goodnessof fit tests, making use of the technique of posterior predictive checks. To illustrate the approach, we applied the PMI) model to opinions of respondents of diferent countries concerning the possibility of contracting AID)S in a specific sitizationi.
Avoiding model selection in Bayesian social research
 Sociological Methodology
, 1994
"... Introduction Raftery's paper addresses two important problems in the statistical analysis of social science data: (1) choosing an appropriate model when so much data are available that standard Pvalues reject all parsimonious models; and (2) making estimates and predictions when there are not enou ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Introduction Raftery's paper addresses two important problems in the statistical analysis of social science data: (1) choosing an appropriate model when so much data are available that standard Pvalues reject all parsimonious models; and (2) making estimates and predictions when there are not enough data available to fit the desired model using standard techniques. For both problems, we agree with Raftery that classical frequentist methods fail and that Raftery's suggested methods based on BIC can point in better directions. Nevertheless, we disagree with his solutions because, in principle, they are still directed offtarget and only by serendipity manage to hit the target in special circumstances. Our primary criticisms of Raftery's proposals are that (1) he promises the impossible: the selection of a model that is adequate for specific purposes without consideration of those purposes; and (2) he uses the same limited tool for model averaging as for model selection, thereby
Modeling Behavioral Regularities of Consumer Learning in Conjoint Analysis
, 2004
"... In this note, we propose several extensions of the model of consumer learning in conjoint analysis developed by Bradlow, Hu, and Ho (2004). We present clarification of the original model; propose an integration of several new imputation rules; add new measurement metrics for pattern matching; and dr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this note, we propose several extensions of the model of consumer learning in conjoint analysis developed by Bradlow, Hu, and Ho (2004). We present clarification of the original model; propose an integration of several new imputation rules; add new measurement metrics for pattern matching; and draw a roadmap for further realworld tests. We discuss general modeling challenges when one wants to mathematically define and integrate behavioral regularities into traditional quantitative domains. We conclude by suggesting several critical success factors for modeling behavioral regularities in marketing. We welcome the constructive comments on our paper (Bradlow, Hu and Ho 2004; BHH hereafter) by Alba and Cooke (2004), Rao (2004), and Rubin (2004). Since a major goal of our paper is to enrich conjoint analysis with a stronger behavioral foundation, we are pleased to hear from our colleagues in Marketing, all of whom have both behavioral modeling and quantitative interests, and from Rubin, who first introduced the formal nomenclature of missing data methods into the statistics literature (Rubin 1976). We believe such dialogue will allow us to harness the strengths of varied research paradigms and make marketing theories more precise and predictive of actual consumer behavior. We would like to organize our responses to the three comments along four subsections. The first section includes general responses that touch on the issues of research language and mathematical formalism and the last three are specific responses to the comments in terms of clarification, additional data analyses, and model extensions.