Results 11  20
of
57
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data
 PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants
A new approach for learning belief networks using independence criteria
 International Journal of Approximate Reasoning
, 2000
"... q ..."
Causal Inference in the Presence of Latent Variables and Selection Bias
 In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence
"... This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the top ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the topology of the graph encodes statistical constraints. We will consider features of such models that can be determined from data under assumptions that are related to those routinely applied in experimental situations:
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning
Likelihoods and Parameter Priors for Bayesian Networks
, 1995
"... We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork s ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork structures from a small set of assessments. The most notable assumption is that of likelihood equivalence, which says that data can not help to discriminate network structures that encode the same assertions of conditional independence. We describe the constructions that follow from these assumptions, and also present a method for directly computing the marginal likelihood of a random sample with no missing observations. Also, we show how these assumptions lead to a general framework for characterizing parameter priors of multivariate distributions. Keywords: Bayesian network, learning, likelihood equivalence, Dirichlet, normalWishart. 1 Introduction A Bayesian network is a graphical repres...
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Robust independence testing for constraintbased learning of causal structure
 In UAI
, 2003
"... This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NX ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NXY Z that can be used with any test of independence that relies on contingency table statistics. NXY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KLDivergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small. 1
The TETRAD Project: Constraint Based Aids to Causal Model Specification
 MULTIVARIATE BEHAVIORAL RESEARCH
"... ..."
Sentiment Extraction From Unstructured Text Using Tabu SearchEnhanced Markov Blanket
 In Proceedings of the Workshop on Mining the Semantic Web, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Abstract. Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine online opinions from the Internet and learn customers ’ preferences for economic or marketing research, or for leveraging a strate ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Abstract. Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine online opinions from the Internet and learn customers ’ preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a twostage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several stateoftheart machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations