Results 1 - 10
of
15
A Simple Relational Classifier
- Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003
, 2003
"... We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilist ..."
Abstract
-
Cited by 58 (13 self)
- Add to MetaCart
We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work.
Learning Statistical Models from Relational Data
, 2001
"... This workshop is the second in a series of workshops held in conjunction with AAAI and IJCAI. The first workshop was held in July, 2000 at AAAI. Notes from that workshop are available at ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
This workshop is the second in a series of workshops held in conjunction with AAAI and IJCAI. The first workshop was held in July, 2000 at AAAI. Notes from that workshop are available at
Distribution-based aggregation for relational learning with identifier attributes
- Machine Learning
, 2004
"... Feature construction through aggregation plays an essential role in modeling relational domains with one-to-many relationships between tables. One-to-many relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Feature construction through aggregation plays an essential role in modeling relational domains with one-to-many relationships between tables. One-to-many relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation from categorical attributes that can take many values (e.g., object identifiers). We present a novel aggregation method as part of a relational learning system ACORA, that combines the use of vector distance and meta-data about the class-conditional distributions of attribute values. We provide a theoretical foundation for this approach deriving a “relational fixed-effect ” model within a Bayesian framework, and discuss the implications of identifier aggregation on the expressive power of the induced model. One advantage of using identifier attributes is the circumvention of limitations caused either by missing/unobserved object properties or by independence assumptions. Finally, we show empirically that the novel aggregators can generalize in the presence of identifier (and other high-dimensional) attributes, and also explore the limitations of the applicability of the methods. 1
Finding tribes: Identifying close-knit individuals from employment patterns
- In Proceedings of the 13 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2007
"... We present a family of algorithms to uncover tribes—groups of individuals who share unusual sequences of affiliations. While much work inferring community structure describes large-scale trends, we instead search for small groups of tightly linked individuals who behave anomalously with respect to t ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present a family of algorithms to uncover tribes—groups of individuals who share unusual sequences of affiliations. While much work inferring community structure describes large-scale trends, we instead search for small groups of tightly linked individuals who behave anomalously with respect to those trends. We apply the algorithms to a large temporal and relational data set consisting of millions of employment records from the National Association of Securities Dealers. The resulting tribes contain individuals at higher risk for fraud, are homogenous with respect to risk scores, and are geographically mobile, all at significant levels compared to random or to other sets of individuals who share affiliations.
You Are Who You Know: Inferring User Profiles in Online Social Networks
"... Online social networks are now a popular way for users to connect, express themselves, and share content. Users in today’s online social networks often post a profile, consisting of attributes like geographic location, interests, and schools attended. Such profile information is used on the sites as ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Online social networks are now a popular way for users to connect, express themselves, and share content. Users in today’s online social networks often post a profile, consisting of attributes like geographic location, interests, and schools attended. Such profile information is used on the sites as a basis for grouping users, for sharing content, and for suggesting users who may benefit from interaction. However, in practice, not all users provide these attributes. In this paper, we ask the question: given attributes for some fraction of the users in an online social network, can we infer the attributes of the remaining users? In other words, can the attributes of users, in combination with the social network graph, be used to predict the attributes of another user in the network? To answer this question, we gather fine-grained data from two social networks and try to infer user profile attributes. We find that users with common attributes are more likely to be friends and often form dense communities, and we propose a method of inferring user attributes that is inspired by previous approaches to detecting communities in social networks. Our results show that certain user attributes can be inferred with high accuracy when given information on as little as 20 % of the users.
Knowledge representation issues in semantic graphs for relationship detection
- In AAAI Spring Symposium, 2005. Recall Rank
, 2005
"... An important task for Homeland Security is the prediction of threat vulnerabilities, such as through the detection of relationships between seemingly disjoint entities. A structure used for this task is a semantic graph, also known as a relational data graph or an attributed relational graph. These ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
An important task for Homeland Security is the prediction of threat vulnerabilities, such as through the detection of relationships between seemingly disjoint entities. A structure used for this task is a semantic graph, also known as a relational data graph or an attributed relational graph. These graphs encode relationships as typed links between a pair of typed nodes. Indeed, semantic graphs are very similar to semantic networks used in AI. The node and link types are related through an ontology graph (also known as a schema). Furthermore, each node has a set of attributes associated with it (e.g., “age ” may be an attribute of a node of type “person”). Unfortunately, the selection of types and attributes for both nodes and links depends on human expertise and is somewhat subjective and even arbitrary. This subjectiveness introduces biases into any algorithm that operates on semantic graphs. Here, we raise some knowledge representation issues for semantic graphs and provide some possible solutions using recently developed ideas in the field of complex networks. In particular, we use the concept of transitivity to evaluate the relevance of individual links in the semantic graph for detecting relationships. We also propose new statistical measures for semantic graphs and illustrate these semantic measures on graphs constructed from movies and terrorism data.
Representing, querying and transforming social networks with rdf/sparql
- Semantic Web: Research and Applications
, 2009
"... Abstract. As social networks are becoming ubiquitous on the Web, the Semantic Web goals indicate that it is critical to have a standard model allowing exchange, interoperability, transformation, and querying of social network data. In this paper we show that RDF/SPARQL meet this desiderata. Building ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. As social networks are becoming ubiquitous on the Web, the Semantic Web goals indicate that it is critical to have a standard model allowing exchange, interoperability, transformation, and querying of social network data. In this paper we show that RDF/SPARQL meet this desiderata. Building on developments of social network analysis, graph databases and Semantic Web, we present a social networks data model based on RDF, and a query and transformation language based on SPARQL meeting the above requirements. We study its expressive power and complexity showing that it behaves well, and present an illustrative prototype. 1
Classification in networked data
, 2006
"... This paper 1 is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning rese ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper 1 is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.
Towards Discovering Organizational Structure from Email Corpus
"... Email logs people's communication history which provides valuable information regarding the infrastructure of an organization. In this paper, a two-phase framework is introduced to attack the problem of leadership discovery in an organization based on email communication history among the employees. ..."
Abstract
- Add to MetaCart
Email logs people's communication history which provides valuable information regarding the infrastructure of an organization. In this paper, a two-phase framework is introduced to attack the problem of leadership discovery in an organization based on email communication history among the employees. Two heuristic metrics are proposed for evaluating pair-wise leadership factors among a group of employees. We also address several issues in discovering the organization's structure through mining leadership graph constructed from the leadership factors. Experimental studies are carried out by applying the framework to Enron email corpus.
Cornell/Stanford University
"... How do blogs produce posts? What local, underlying mechanisms lead to the bursty temporal behaviors observed in blog networks? Earlier work analyzed network patterns of blogs and found that blog behavior is bursty and often follows power laws in both topological and temporal characteristics. However ..."
Abstract
- Add to MetaCart
How do blogs produce posts? What local, underlying mechanisms lead to the bursty temporal behaviors observed in blog networks? Earlier work analyzed network patterns of blogs and found that blog behavior is bursty and often follows power laws in both topological and temporal characteristics. However, no intuitive and realistic model has yet been introduced, that can lead to such patterns. This is exactly the focus of this work. We propose a generative model that uses simple and intuitive principles for each individual blog, and yet it is able to produce the temporal characteristics of the blogosphere together with global topological network patterns, like power-laws for degree distributions, for inter-posting times, and several more. Our model ZC uses a novel ‘zero-crossing ’ approach based on a random walk, combined with other powerful ideas like exploration and exploitation. This makes it the first model to simultaneously model the topology and temporal dynamics of the blogosphere. We validate our model with experiments on a large collection of 45,000 blogs and 2.2 million posts. 1

