Results 1  10
of
401
Markov Logic Networks
 Machine Learning
, 2006
"... Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects ..."
Abstract

Cited by 568 (33 self)
 Add to MetaCart
Abstract. We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 565 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Discriminative probabilistic models for relational data
, 2002
"... In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, igno ..."
Abstract

Cited by 349 (11 self)
 Add to MetaCart
In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. In this paper, we present an alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach. First, undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. Second, undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy. We show how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities. We provide experimental results on a webpage classification task, showing that accuracy can be significantly improved by modeling relational dependencies. 1
Mining the Network Value of Customers
 In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining
, 2002
"... One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected pro t from a customer is greater than the cost of marketing to her, the marketing action for that customer is executed. So far, work in this area has considered only ..."
Abstract

Cited by 330 (11 self)
 Add to MetaCart
One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected pro t from a customer is greater than the cost of marketing to her, the marketing action for that customer is executed. So far, work in this area has considered only the intrinsic value of the customer (i.e, the expected pro t from sales to her). We propose to model also the customer's network value: the expected pro t from sales to other customers she may inuence to buy, the customers those may inuence, and so on recursively. Instead of viewing a market as a set of independent entities, we view it as a social network and model it as a Markov random eld. We show the advantages of this approach using a social network mined from a collaborative ltering database. Marketing that exploits the network value of customersalso known as viral marketingcan be extremely eective, but is still a black art. Our work can be viewed as a step towards providing a more solid foundation for it, taking advantage of the availability of large relevant databases. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications data mining
Learning and inferring transportation routines
 Artificial Intelligence
"... This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation o ..."
Abstract

Cited by 230 (20 self)
 Add to MetaCart
This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation or her goal. We apply RaoBlackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of transportation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal behaviors (e.g. taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a person and to recognize situations in which the user performs unknown activities.
The Missing Link  A Probabilistic Model of Document Content and Hypertext Connectivity
, 2001
"... We describe a joint probabilistic model for modeling the contents and interconnectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as autho ..."
Abstract

Cited by 186 (3 self)
 Add to MetaCart
We describe a joint probabilistic model for modeling the contents and interconnectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as authoritative documents within those topics. Furthermore, the relationships between topics is mapped out in order to build a predictive model of link content. Among the many applications of this approach are information retrieval and search, topic identification, query disambiguation, focused web crawling, web authoring, and bibliometric analysis.
Learning Bayesian network structure from massive datasets: The “sparse candidate” algorithm
, 1999
"... Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the searc ..."
Abstract

Cited by 180 (9 self)
 Add to MetaCart
Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and reallife data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures. 1
Representing and querying correlated tuples in probabilistic databases
 In ICDE
, 2007
"... Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions abo ..."
Abstract

Cited by 117 (11 self)
 Add to MetaCart
Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an efficient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental evaluation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1
Learning relational probability trees
 In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) (2003
"... Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and i ..."
Abstract

Cited by 117 (33 self)
 Add to MetaCart
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance. 1.
Why Collective Inference Improves Relational Classification
 In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial tr ..."
Abstract

Cited by 111 (24 self)
 Add to MetaCart
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that collective inference can significantly reduce classification error when compared with traditional inference techniques. We investigate the underlying mechanisms for this error reduction by reviewing past work on collective inference and characterizing different types of statistical models used for making inference in relational data. We show important differences among these models, and we characterize the necessary and sufficient conditions for reduced classification error based on experiments with real and simulated data.