Results 1  10
of
24
Markov Logic Networks
 Machine Learning
, 2006
"... Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects ..."
Abstract

Cited by 569 (34 self)
 Add to MetaCart
Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 88 (17 self)
 Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning
 In Proceedings of the 20th Intl. Conf. on Machine Learning
, 2003
"... We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally opti ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally optimal combination of inarcs and outarcs with which to reinsert it. The heart of the paper is a new algorithm called ORSearch which allows each optimal reinsertion step to be computed efficiently on large datasets. Our empirical results compare Optimal Reinsertion against a highly tuned implementation of multirestart hill climbing. The results typically show one to two orders of magnitude speedup on a variety of datasets. They usually show better final results, both in terms of BDEU score and in modeling of future data drawn from the same distribution. 1. Bayesian Network Structure Search Given a dataset of R records and m categorical attributes, how can we find a Bayesian network structure that provides a good model of the data? Happily, the formulation of this question into a welldefined optimization problem is now fairly well understood (Heckerman et al., 1995; Cooper & Herskovits, 1992). However, finding the optimal solution is an NPcomplete problem (Chickering, 1996a). The computational issues in performing heuristic search in this space are also severe, even taking into account the numerous ingenious and effective innovations in recent years (e.g.
Tractable learning of large bayes net structures from sparse data
, 2004
"... statistics for creating the global Bayes Net. This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: ho ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
statistics for creating the global Bayes Net. This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model? Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache nway sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets.
Graphical models of residue coupling in protein families
 In 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD
, 2005
"... Identifying residue coupling relationships within a protein family can provide important insights into the family’s evolutionary record, and has significant applications in analyzing and optimizing sequencestructurefunction relationships. We present the first algorithm to infer an undirected graph ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Identifying residue coupling relationships within a protein family can provide important insights into the family’s evolutionary record, and has significant applications in analyzing and optimizing sequencestructurefunction relationships. We present the first algorithm to infer an undirected graphical model representing residue coupling in protein families. Such a model, which we call a residue coupling network, serves as a compact description of the joint amino acid distribution, focused on the independences among residues. This stands in contrast to current methods, which manipulate dense representations of covariation and are focused on assessing dependence, which can conflate direct and indirect relationships. Our probabilistic model provides a sound basis for predictive (will this newly designed protein be folded and functional?), diagnostic (why is this protein not stable or functional?), and abductive reasoning (what if I attempt to graft features of one protein family onto another?). Further, our algorithm can readily incorporate, as priors, hypotheses regarding possible underlying mechanistic/energetic explanations for coupling. The resulting approach constitutes a powerful and discriminatory mechanism to identify residue coupling from protein sequences and structures. Analysis results on the Gprotein coupled receptor (GPCR) and PDZ domain families demonstrate the ability of our approach to effectively uncover and exploit models of residue coupling.
A General Framework for Mining Massive Data Stream
 Journal of Computational and Graphical Statistics
, 2003
"... In many domains, data now arrives faster than we are able to mine it. To avoid wasting this data, we must switch from the traditional "oneshot" data mining approach to systems that are able to mine continuous, highvolume, openended data streams as they arrive. In this extended abstract we identif ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
In many domains, data now arrives faster than we are able to mine it. To avoid wasting this data, we must switch from the traditional "oneshot" data mining approach to systems that are able to mine continuous, highvolume, openended data streams as they arrive. In this extended abstract we identify some desiderata for such systems, and outline our framework for realizing them. A key property of our approach is that it minimizes the time required to build a model on a stream, while guaranteeing (as long as the data is i.i.d.) that the model learned is e#ectively indistinguishable from the one that would be obtained using infinite data. Using this framework, we have successfully adapted several learning algorithms to massive data streams, including decision tree induction, Bayesian network learning, kmeans clustering, and the EM algorithm for mixtures of Gaussians. These algorithms are able to process on the order of billions of examples per day using o#theshelf hardware. Building on this, we are currently developing software primitives for scaling arbitrary learning algorithms to massive data streams with minimal e#ort.
Anytime classification using the nearest neighbor algorithm with applications to stream mining
 IEEE International Conference on Data Mining (ICDM
, 2006
"... For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we can convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can utilize the extra time to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains.
Learning Arithmetic Circuits
"... Graphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly p ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
Graphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly penalizes the cost of inference. Specifically, we learn arithmetic circuits with a penalty on the number of edges in the circuit (in which the cost of inference is linear). Our algorithm is equivalent to learning a Bayesian network with contextspecific independence by greedily splitting conditional distributions, at each step scoring the candidates by compiling the resulting network into an arithmetic circuit, and using its size as the penalty. We show how this can be done efficiently, without compiling a circuit from scratch for each candidate. Experiments on several realworld domains show that our algorithm is able to learn tractable models with very large treewidth, and yields more accurate predictions than a standard contextspecific Bayesian network learner, in far less time.
Learning Bayesian Networks From Dependency Networks: A Preliminary Study
 in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL
, 2003
"... In this paper we describe how to learn Bayesian networks from a summary of complete data in the form of a dependency network rather than from data directly. This method allows us to gain the advantages of both representations: scalable algorithms for learning dependency networks and convenien ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In this paper we describe how to learn Bayesian networks from a summary of complete data in the form of a dependency network rather than from data directly. This method allows us to gain the advantages of both representations: scalable algorithms for learning dependency networks and convenient inference with Bayesian networks. Our approach is to use a dependency network as an "oracle" for the statistics needed to learn a Bayesian network. We show that the general problem is NPhard and develop a greedy search algorithm. We conduct a preliminary experimental evaluation and find that the prediction accuracy of the Bayesian networks constructed from our algorithm almost equals that of Bayesian networks learned directly from the data.
Mining Massive Relational Databases
"... There is a large and growing mismatch between the size of the relational data sets available for mining and the amount of data our relational learning systems can process. In particular, most relational learning systems can operate on data sets containing thousands to tens of thousands of objec ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
There is a large and growing mismatch between the size of the relational data sets available for mining and the amount of data our relational learning systems can process. In particular, most relational learning systems can operate on data sets containing thousands to tens of thousands of objects, while many realworld data sets grow at a rate of millions of objects a day. In this paper we explore the challenges that prevent relational learning systems from operating on massive data sets, and develop a learning system that overcomes some of them. Our system uses sampling, is efficient with disk accesses, and is able to learn from an order of magnitude more relational data than existing algorithms. We evaluate our system by using it to mine a collection of massive Web crawls, each containing millions of pages.