Results 11  20
of
306
Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Models
, 1993
"... Ways of obtaining approximate Bayes factors for generalized linear models are described, based on the Laplace method for integrals. I propose a new approximation which uses only the output of standard computer programs such as GUM; this appears to be quite accurate. A reference set of proper priors ..."
Abstract

Cited by 149 (28 self)
 Add to MetaCart
Ways of obtaining approximate Bayes factors for generalized linear models are described, based on the Laplace method for integrals. I propose a new approximation which uses only the output of standard computer programs such as GUM; this appears to be quite accurate. A reference set of proper priors is suggested, both to represent the situation where there is not much prior information, and to assess the sensitivity of the results to the prior distribution. The methods can be used when the dispersion parameter is unknown, when there is overdispersion, to compare link functions, and to compare error distributions and variance functions. The methods can be used to implement the Bayesian approach to accounting for model uncertainty. I describe an application to inference about relative risks in the presence of control factors where model uncertainty is large and important. Software to implement the
An algebra for probabilistic databases
"... An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly prob ..."
Abstract

Cited by 148 (1 self)
 Add to MetaCart
An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.
Matching and Record Linkage
 Business Survey Methods
, 1995
"... INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters b ..."
Abstract

Cited by 122 (16 self)
 Add to MetaCart
(Show Context)
INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters by Colledge, Nijhowne, and Archer). A survey of retail establishments or agricultural establishments might combine results from an area frame and a list frame. To produce a combined estimator, units from the area frame would need to be identified in the list frame (see VogelKott chapter). To estimate the size of a (sub)population via capturerecapture techniques, one needs to accurately determine units common to two or more independent listings (Sekar and Deming 1949; Scheuren 1983; Winkler 1989b). Samples must be drawn appropriately to estimate overlap (Deming and Gleser 1959). Rather than develop a special survey to collect data for policy decisions, it might be more appropriate t
Social Background and School Continuation Decisions
 Journal of the American Statistical Association
, 1981
"... David Featherman and Robert Hauser for making available the OCG II data. An ..."
Abstract

Cited by 121 (2 self)
 Add to MetaCart
(Show Context)
David Featherman and Robert Hauser for making available the OCG II data. An
Independence is Good: DependencyBased Histogram Synopses for HighDimensional Data
 In SIGMOD
, 2001
"... Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic ind ..."
Abstract

Cited by 70 (12 self)
 Add to MetaCart
(Show Context)
Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic independence assumptions or try to directly approximate the full joint data distribution over the complete set of attributes. Unfortunately, both approaches are doomed to fail for highdimensional data sets with complex correlation patterns between attributes. In this paper, we propose a novel approach to histogrambased synopses that employs the solid foundation of statistical interaction models to explicitly identify and exploit the statistical characteristics of the data. Abstractly, our key idea is to break the synopsis into (1) a statistical interaction model that accurately captures significant correlation and independence patterns in data, and (2) a collection of histograms on lowdimensional marginals that, based on the model, can provide accurate approximations of the overall joint data distribution. Extensive experimental results with several reallife data sets verify the effectiveness of our approach. An important aspect of our general, modelbased methodology is that it can be used to enhance the performance of other synopsis techniques that are based on dataspace partitioning (e.g., wavelets) by providing an effective tool to deal with the “dimensionality curse”. 1.
The roles of conflict engagement, escalation, and avoidance in marital interaction: A longitudinal view of the five types of couples
 Journal of Consulting and Clinical Psychology
, 1993
"... Seventythree couples were studied at 2 time points 4 years apart. A typology of 5 groups of couples is proposed on the basis of observational data of Time 1 resolution of conflict, specific affects, and affect sequences. Over the 4 years, the groups of couples differed significantly in serious cons ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
(Show Context)
Seventythree couples were studied at 2 time points 4 years apart. A typology of 5 groups of couples is proposed on the basis of observational data of Time 1 resolution of conflict, specific affects, and affect sequences. Over the 4 years, the groups of couples differed significantly in serious considerations of divorce and in the frequency of divorce. There were 3 groups of stable couples: validators, volatiles, and avoiders, who could be distinguished from each other on problemsolving behavior, specific affects, and persuasion attempts. There were 2 groups of unstable couples: hostile and hostile/detached, who could be distinguished from each other on problemsolving behavior and on specific negative and positive affects. A balance theory of marriage is proposed, which explores the idea that 3 distinct adaptations exist for having a stable marriage.
An Application Of The FellegiSunter Model Of Record Linkage To The 1990 U.S. Decennial Census
 U.S. Decennial Census”. Technical report, US Bureau of the Census
, 1987
"... This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a pri ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known. Key words and phrases. EM Algorithm; String Comparator Metric; LP Algorithm; Decision Rule; Error Rate. 1. INTRODUCTION This paper describes a particular application of the FellegiSunter (1969) model of record linkage. New computational methods are used for computer matching the Post Enumeration Survey (PES) with the Census. The PES is used to produce adjusted Census counts. Computer matching is the first stage of PES processing. All crucial matching paramete...
A simple constraintbased algorithm for efficiently mining observational databases for causal relationships
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. This paper presents a simple, efficient computerbased method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observation ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a simple, efficient computerbased method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraintbased approach to causal discovery. A primary purpose of this paper is to present the constraintbased causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraintbased causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.
Improved decision rules in the FellegiSunter model of record linkage
 in American Statistical Association Proceedings of Survey Research Methods Section
, 1993
"... Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advance ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advances in statistical and computational theory. An ExpectationMaximization (EMH) algorithm that constrains the estimates to a convex subregion of the parameter space is given. The EMH algorithm provides probability estimates that yield better decision rules than unconstrained estimates. The algorithm is related to results of Meng and Rubin (1993) on MultiCycle ExpectationConditional Maximization algorithms and make use of results of Haberman (1977) that hold for large classes of loglinear models. Key Words: MCECM Algorithm, Latent Class, Computer Matching, Error Rate This paper provides a theory for obtaining constrained maximum likelihood estimates for latentclass, loglinear models on finite state spaces. The work is related to ExpectationMaximization (EM) algorithms by Meng and Rubin (1993) for obtaining unconstrained maximum likelihood estimates. Meng and Rubin generalized the original ideas of Dempster,