Results 1  10
of
802
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract

Cited by 443 (22 self)
 Add to MetaCart
(Show Context)
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Collective classification in network data
, 2008
"... Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract

Cited by 174 (33 self)
 Add to MetaCart
(Show Context)
Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and realworld data.
ProbLog: a probabilistic Prolog and its application in link discovery
 In Proceedings of 20th International Joint Conference on Artificial Intelligence
, 2007
"... We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defi ..."
Abstract

Cited by 146 (27 self)
 Add to MetaCart
We introduce ProbLog, a probabilistic extension of Prolog. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query, which corresponds to the probability that the query succeeds in a randomly sampled program. The key contribution of this paper is the introduction of an effective solver for computing success probabilities. It essentially combines SLDresolution with methods for computing the probability of Boolean formulae. Our implementation further employs an approximation algorithm that combines iterative deepening with binary decision diagrams. We report on experiments in the context of discovering links in real biological networks, a demonstration of the practical usefulness of the approach. 1
Sound and efficient inference with probabilistic and deterministic dependencies
, 2006
"... Reasoning with both probabilistic and deterministic dependencies is important for many realworld problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when determin ..."
Abstract

Cited by 130 (17 self)
 Add to MetaCart
(Show Context)
Reasoning with both probabilistic and deterministic dependencies is important for many realworld problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deterministic or neardeterministic dependencies are present, and logical ones like satisfiability testing are inapplicable to probabilistic ones. In this paper we propose MCSAT, an inference algorithm that combines ideas from MCMC and satisfiability. MCSAT is based on Markov logic, which defines Markov networks using weighted clauses in firstorder logic. From the point of view of MCMC,MCSAT is a slice sampler with an auxiliary variable per clause, and with a satisfiabilitybased method for sampling the original variables given the auxiliary ones. From the point of view of satisfiability, MCSAT wraps a procedure around the SampleSAT uniform sampler that enables it to sample from highly nonuniform distributions over satisfying assignments. Experiments on entity resolution and collective classification problems show that MCSAT greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.
Lifted firstorder probabilistic inference
 In Proceedings of IJCAI05, 19th International Joint Conference on Artificial Intelligence
, 2005
"... Most probabilistic inference algorithms are specified and processed on a propositional level. In the last decade, many proposals for algorithms accepting firstorder specifications have been presented, but in the inference stage they still operate on a mostly propositional representation level. [Poo ..."
Abstract

Cited by 125 (8 self)
 Add to MetaCart
Most probabilistic inference algorithms are specified and processed on a propositional level. In the last decade, many proposals for algorithms accepting firstorder specifications have been presented, but in the inference stage they still operate on a mostly propositional representation level. [Poole, 2003] presented a method to perform inference directly on the firstorder level, but this method is limited to special cases. In this paper we present the first exact inference algorithm that operates directly on a firstorder level, and that can be applied to any firstorder model (specified in a language that generalizes undirected graphical models). Our experiments show superior performance in comparison with propositional exact inference. 1
Extracting places and activities from gps traces using hierarchical conditional random fields
 International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
(Show Context)
Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes highlevel context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
Joint inference in information extraction
 In Proceedings of the 22nd National Conference on Artificial Intelligence (2007
"... The goal of information extraction is to extract database records from text or semistructured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this ..."
Abstract

Cited by 118 (8 self)
 Add to MetaCart
(Show Context)
The goal of information extraction is to extract database records from text or semistructured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this approach is suboptimal, because it ignores the fact that segmenting one candidate record can help to segment similar ones. For example, resolving a wellsegmented field with a lessclear one can disambiguate the latter’s boundaries. In this paper we propose a joint approach to information extraction, where segmentation of all records and entity resolution are performed together in a single integrated inference process. While a number of previous authors have taken steps in this direction (e.g., Pasula et al. (2003), Wellner et al. (2004)), to our knowledge this is the first fully joint approach. In experiments on the CiteSeer and Cora citation matching datasets, joint inference improved accuracy, and our approach outperformed previous ones. Further, by using Markov logic and the existing algorithms for it, our solution consisted mainly of writing the appropriate logical formulas, and required much less engineering than previous ones.
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 114 (20 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Lifted firstorder belief propagation
 In Association for the Advancement of Artificial Intelligence (AAAI
, 2008
"... Unifying firstorder logic and probability is a longstanding goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying s ..."
Abstract

Cited by 112 (15 self)
 Add to MetaCart
(Show Context)
Unifying firstorder logic and probability is a longstanding goal of AI, and in recent years many representations combining aspects of the two have been proposed. However, inference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying standard probabilistic inference methods to the resulting network. Ideally, inference should be lifted as in firstorder logic, handling whole sets of indistinguishable objects together, in time independent of their cardinality. Poole (2003) and Braz et al. (2005, 2006) developed a lifted version of the variable elimination algorithm, but it is extremely complex, generally does not scale to realistic domains, and has only been applied to very small artificial problems. In this paper we propose the first lifted version of a scalable probabilistic inference algorithm, belief propagation (loopy or not). Our approach is based on first constructing a lifted network, where each node represents a set of ground atoms that all pass the same messages during belief propagation. We then run belief propagation on this network. We prove the correctness and optimality of our algorithm. Experiments show that it can greatly reduce the cost of inference.
Relational dependency networks
 Journal of Machine Learning Research
, 2007
"... Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most re ..."
Abstract

Cited by 112 (24 self)
 Add to MetaCart
Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational datasets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of realworld datasets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions.