Results 11  20
of
156
Independence is Good: DependencyBased Histogram Synopses for HighDimensional Data
 In SIGMOD
, 2001
"... Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic ind ..."
Abstract

Cited by 61 (11 self)
 Add to MetaCart
Approximating the joint data distribution of a multidimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. Existing solutions either rely on simplistic independence assumptions or try to directly approximate the full joint data distribution over the complete set of attributes. Unfortunately, both approaches are doomed to fail for highdimensional data sets with complex correlation patterns between attributes. In this paper, we propose a novel approach to histogrambased synopses that employs the solid foundation of statistical interaction models to explicitly identify and exploit the statistical characteristics of the data. Abstractly, our key idea is to break the synopsis into (1) a statistical interaction model that accurately captures significant correlation and independence patterns in data, and (2) a collection of histograms on lowdimensional marginals that, based on the model, can provide accurate approximations of the overall joint data distribution. Extensive experimental results with several reallife data sets verify the effectiveness of our approach. An important aspect of our general, modelbased methodology is that it can be used to enhance the performance of other synopsis techniques that are based on dataspace partitioning (e.g., wavelets) by providing an effective tool to deal with the “dimensionality curse”. 1.
Improved decision rules in the FellegiSunter model of record linkage
 in American Statistical Association Proceedings of Survey Research Methods Section
, 1993
"... Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advance ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
Many applications of the FellegiSunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advances in statistical and computational theory. An ExpectationMaximization (EMH) algorithm that constrains the estimates to a convex subregion of the parameter space is given. The EMH algorithm provides probability estimates that yield better decision rules than unconstrained estimates. The algorithm is related to results of Meng and Rubin (1993) on MultiCycle ExpectationConditional Maximization algorithms and make use of results of Haberman (1977) that hold for large classes of loglinear models. Key Words: MCECM Algorithm, Latent Class, Computer Matching, Error Rate This paper provides a theory for obtaining constrained maximum likelihood estimates for latentclass, loglinear models on finite state spaces. The work is related to ExpectationMaximization (EM) algorithms by Meng and Rubin (1993) for obtaining unconstrained maximum likelihood estimates. Meng and Rubin generalized the original ideas of Dempster,
Sequential Model Selection for Word Sense Disambiguation
, 1997
"... Statistical models of wordsense disam biguation are often based on a small num ber of contextual features or on a model that is assumed to characterize the inter actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search ..."
Abstract

Cited by 28 (13 self)
 Add to MetaCart
Statistical models of wordsense disam biguation are often based on a small num ber of contextual features or on a model that is assumed to characterize the inter actions among a set of features. Model selection is presented as an alternative to these approaches, where a sequential search of possible models is conducted in order to find the model that best characterizes the interactions among features. This paper expands existing model selection methodology and presents the first comparative study of model selection search strategies and evaluation criteria when applied to the problem of building probabilistic classifiers for wordsense disambiguation.
A simple constraintbased algorithm for efficiently mining observational databases for causal relationships
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. This paper presents a simple, efficient computerbased method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observation ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
Abstract. This paper presents a simple, efficient computerbased method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraintbased approach to causal discovery. A primary purpose of this paper is to present the constraintbased causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraintbased causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.
An Application Of The FellegiSunter Model Of Record Linkage To The 1990 U.S. Decennial Census
 U.S. Decennial Census”. Technical report, US Bureau of the Census
, 1987
"... This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a pri ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known. Key words and phrases. EM Algorithm; String Comparator Metric; LP Algorithm; Decision Rule; Error Rate. 1. INTRODUCTION This paper describes a particular application of the FellegiSunter (1969) model of record linkage. New computational methods are used for computer matching the Post Enumeration Survey (PES) with the Census. The PES is used to produce adjusted Census counts. Computer matching is the first stage of PES processing. All crucial matching paramete...
A New Supervised Learning Algorithm for Word Sense Disambiguation
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
"... The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of mo ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the bestfitting model at each level of model complexity. The Naive Mix utilizes this sequence of models to define a probabilistic model which is then used as a probabilistic classifier to perform wordsense disambiguation. The models in this sequence are restricted to the class of decomposable loglinear models. This class of models offers a number of computational advantages. Experiments disambiguating twelve different words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearestneighbor classif...
On the Dirichlet Prior and Bayesian Regularization
 In Advances in Neural Information Processing Systems 15
, 2002
"... A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichle ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from highthroughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "tradeoff" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this tradeoff in the sense of predictive accuracy.
The mixture transition distribution model for highorder Markov chains and nonGaussian time series
 Statistical Science
, 2002
"... Abstract. The mixture transition distribution model (MTD) was introduced in 1985 by Raftery for the modeling of highorder Markov chains with a finite state space. Since then it has been generalized and successfully applied to a range of situations, including the analysis of wind directions, DNA seq ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract. The mixture transition distribution model (MTD) was introduced in 1985 by Raftery for the modeling of highorder Markov chains with a finite state space. Since then it has been generalized and successfully applied to a range of situations, including the analysis of wind directions, DNA sequences and social behavior. Here we review the MTD model and the developments since 1985. We first introduce the basic principle and then we present several extensions, including general state spaces and spatial statistics. Following that, we review methods for estimating the model parameters. Finally, a review of different types of applications shows the practical interest of the MTD model. Key words and phrases: Mixture transition distribution (MTD) model, Markov chains, highorder dependences, time series, GMTD model, EM algorithm,
Unsupervised Non Stationary Image Segmentation Using Triplet Markov Chains
 In Advanced Concepts for Intelligent Vision Systems (ACVIS 04
, 2004
"... This work deals with the unsupervised Bayesian hidden Markov chain restoration extended to the non stationary case. Unsupervised restoration based on "ExpectationMaximization " (EM) or "Stochastic EM" (SEM) estimates considering the "Hidden Markov Chain" (HMC) model is quite efficient when the hidde ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
This work deals with the unsupervised Bayesian hidden Markov chain restoration extended to the non stationary case. Unsupervised restoration based on "ExpectationMaximization " (EM) or "Stochastic EM" (SEM) estimates considering the "Hidden Markov Chain" (HMC) model is quite efficient when the hidden chain is stationary. However, when the latter is not stationary, the unsupervised restoration results can be poor, due to a bad match between the real and estimated models. In this paper we present a more appropriate model for non stationary HMC, via recent Triplet Markov Chains (TMC) model. Using TMC, we show that the classical restoration results can be significantly improved in the case of non stationary data. The latter improvement is performed in an unsupervised way using a SEM parameter estimation method. Some application examples to unsupervised image segmentation are also provided.