Results 1 - 10
of
11
A Simple Relational Classifier
- Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003
, 2003
"... We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilist ..."
Abstract
-
Cited by 58 (13 self)
- Add to MetaCart
We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work.
Collective Classification with Relational Dependency Networks
- Journal of Machine Learning Research
, 2003
"... this paper, we present relational dependency networks (RDNs), extending recent work in dependency networks to a relational setting ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
this paper, we present relational dependency networks (RDNs), extending recent work in dependency networks to a relational setting
STATISTICAL MODELS AND ANALYSIS TECHNIQUES FOR LEARNING IN RELATIONAL DATA
, 2006
"... Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and
thereby decision-making, if machine learning techniques can effectively exploit the relational information.
This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason
about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable.
We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this
framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In
particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across
different types of relational data.
Relational Learning Problems and Simple Models
- IJCAI 2003 Workshop on Learning Statistical Models from Relational Data (SRL-2003
, 2003
"... In recent years, we have seen remarkable advances in algorithms for relational learning, especially statistically based algorithms. These algorithms have been developed in a wide variety of different research fields and problem settings. It is important scientifically to understand the strengths, we ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
In recent years, we have seen remarkable advances in algorithms for relational learning, especially statistically based algorithms. These algorithms have been developed in a wide variety of different research fields and problem settings. It is important scientifically to understand the strengths, weaknesses, and applicability of the various methods. However, we are stymied by a lack of a common framework for characterizing relational learning. What are the dimensions along which relational learning problems and potential solutions should be characterized? Jensen (1998) outlined dimensions that are applicable to relational learning, including various measures of size, interconnectivity and variety; items to be characterized include the data, the (true) model, the background knowledge, and so on. Additionally, individual research papers will characterize aspects of relational learning that they are considering and are ignoring. However, there are few studies or even position papers that examine various methods, contrasting them along common dimensions (one notable exception being the paper by Jensen and Neville (2002b)). It also is not clear whether straightforward measures of size, interconnectivity, or variety will be the best dimensions. In this paper we argue that other sorts of dimensions are at least as important. In particular, the aforementioned dimensions characterize the learning problem (i.e., the training data and the true model). Equally important are characteristics of the context for using the learned model—which have important implications for learning. For illustration, let us discuss three context characteristics, and their implications for studying relational learning algorithms.
Using graph-based metrics with empirical risk minimization to speed up active learning on networked data
- Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2009
"... Active and semi-supervised learning are important techniques when labeled data are scarce. Recently a method was suggested for combining active learning with a semi-supervised learning algorithm that uses Gaussian fields and harmonic functions. This classifier is relational in nature: it relies on h ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Active and semi-supervised learning are important techniques when labeled data are scarce. Recently a method was suggested for combining active learning with a semi-supervised learning algorithm that uses Gaussian fields and harmonic functions. This classifier is relational in nature: it relies on having the data presented as a partially labeled graph (also known as a within-network learning problem). This work showed yet again that empirical risk minimization (ERM) was the best method to find the next instance to label and provided an efficient way to compute ERM with the semisupervised classifier. The computational problem with ERM is that it relies on computing the risk for all possible instances. If we could limit the candidates that should be investigated, then we can speed up active learning considerably. In the case where the data is graphical in nature, we can leverage the graph structure to rapidly identify instances that are likely to be good candidates for labeling. This paper describes a novel hybrid approach of using of community finding and social network analytic centrality measures to identify good candidates for labeling and then using ERM to find the best instance in this candidate set. We show on real-world data that we can limit the ERM computations to a fraction of instances with comparable performance.
Relational ensemble classification
- IN: ICDM
, 2006
"... Relational classification aims at including relations among entities, for example taking relations between documents such as a common author or citations into account. However, considering more than one relation can further improve classification accuracy. In this paper we introduce a new approach t ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Relational classification aims at including relations among entities, for example taking relations between documents such as a common author or citations into account. However, considering more than one relation can further improve classification accuracy. In this paper we introduce a new approach to make use of several relations as well as both relations and attributes for classification using ensemble methods. To accomplish this, we present a generic relational ensemble model, that can use different relational and local classifiers as components. Furthermore, we discuss solutions for several problems concerning relational data such as heterogeneity, sparsity, and multiple relations. Finally, we provide empirical evidence, that our relational ensemble methods outperform existing relational classification methods, even rather complex models such as relational probability trees (RPTs), relational dependency networks (RDNs) and relational Bayesian classifiers (RBCs).
A Bias/Variance Decomposition for Models Using Collective Inference
"... Abstract. Bias/variance analysis is a useful tool for investigating the performance of machine learning algorithms. Conventional analysis decomposes loss into errors due to aspects of the learning process, but in relational domains, the inference process used for prediction introduces an additional ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. Bias/variance analysis is a useful tool for investigating the performance of machine learning algorithms. Conventional analysis decomposes loss into errors due to aspects of the learning process, but in relational domains, the inference process used for prediction introduces an additional source of error. Collective inference techniques introduce additional error, both through the use of approximate inference algorithms and through variation in the availability of test-set information. To date, the impact of inference error on model performance has not been investigated. We propose a new bias/variance framework that decomposes loss into errors due to both the learning and inference processes. We evaluate the performance of three relational models on both synthetic and real-world datasets and show that (1) inference can be a significant source of error, and (2) the models exhibit different types of errors as data characteristics are varied.
NetKit-SRL: A Toolkit for Network Learning and Inference -- and its use for classification of networked data
- PROC. ANN. CONF. NORTH AM. ASSOC. COMPUTATIONAL SOCIAL AND ORGANIZATIONAL SCIENCE (NAACSOS
, 2005
"... This paper describes NetKit-SRL, or NetKit for short, a toolkit for learning from and classifying networked data. The toolkit is open-source and publicly available. It is modular and built for ease of plug-and-play---such that it is easy to add new modules and have them interact with other existing ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes NetKit-SRL, or NetKit for short, a toolkit for learning from and classifying networked data. The toolkit is open-source and publicly available. It is modular and built for ease of plug-and-play---such that it is easy to add new modules and have them interact with other existing modules. Currently available NetKit modules are focused on "batch" within-network learning and classification: given a partially labeled network, where all nodes and edges are already known to exist, estimate the class membership probability of the unlabeled nodes in the network. NetKit has been used in various network domains such as websites, citation graphs, movies and social networks.
A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring
, 2006
"... ..."
A shrinkage approach for modeling non-stationary relational autocorrelation
- In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. IEEE Computer Society
"... Recent research has shown that collective classification in relational data often exhibit significant performance gains over conventional approaches that classify instances individually. This is primarily due to the presence of autocorrelation in relational datasets, which means that the class label ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Recent research has shown that collective classification in relational data often exhibit significant performance gains over conventional approaches that classify instances individually. This is primarily due to the presence of autocorrelation in relational datasets, which means that the class label of related entities are correlated and inferences about one instance can be used to improve inferences about linked instances. Statistical relational learning techniques exploit relational autocorrelation by modeling global autocorrelation dependencies under the assumption that the level of autocorrelation is stationary throughout the dataset. To date, there has been no work examining the appropriateness of this stationarity assumption. In this paper, we examine two real-world datasets and show that there is significant variance in the autocorrelation dependencies throughout the relational data graphs. To account for this, we develop a technique for modeling non-stationary autocorrelation in relational data. We compare to two baseline techniques which model either the local or the global autocorrelation dependencies in isolation and show that a shrinkage model results in significantly improved model accuracy. 1.

