Results 1 - 10
of
29
Topic and role discovery in social networks
- In IJCAI
, 2005
"... Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction- ..."
Abstract
-
Cited by 109 (12 self)
- Add to MetaCart
Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient—steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher’s email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people’s roles. 1 Introduction and Related Work Social network analysis (SNA) is the study of mathematical models for interactions among people, organizations and groups. With the recent availability of large datasets of human
Mixed membership stochastic block models for relational data with application to protein-protein interactions
- In Proceedings of the International Biometrics Society Annual Meeting
, 2006
"... We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with p ..."
Abstract
-
Cited by 97 (22 self)
- Add to MetaCart
We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. We introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) and a local model to instantiate nodespecific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
Leveraging relational autocorrelation with latent group models
- In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining. ACM
"... Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation—the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure when there is little known information with which to seed inference.
Link Mining: A Survey
- SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-fertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
Relational topic models for document networks
- In Proc. of Conf. on AI and Statistics (AISTATS
"... We develop the relational topic model (RTM), a model of documents and the links between them. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
We develop the relational topic model (RTM), a model of documents and the links between them. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and learning algorithms based on variational methods and evaluate the predictive performance of the RTM for large networks of scientific abstracts and web documents. 1
Community Evolution in Dynamic Multi-Mode Networks
- KDD'08
, 2008
"... A multi-mode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur. Identifying communities in a multi-mode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
A multi-mode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur. Identifying communities in a multi-mode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network and the membership of groups often evolve gradually. In a dynamic multi-mode network, both actor membership and interactions can evolve, which poses a challenging problem of identifying community evolution. In this work, we try to address this issue by employing the temporal information to analyze a multi-mode network. A spectral framework and its scalability issue are carefully studied. Experiments on both synthetic data and real-world large scale networks demonstrate the efficacy of our algorithm and suggest its generality in solving problems with complex relationships.
Group and topic discovery from relations and their attributes
- In NIPS
, 2006
"... We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here w ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model’s joint inference discovers more cohesive groups and improved topics. 1
Structured priors for structure learning
- In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI
, 2006
"... Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model can yield more accurate learned networks than the traditional approach of using a uniform prior, and that the classes found by our model are appropriate. 1
Learning annotated hierarchies from relational data
- In Advances in Neural Information Processing Systems
, 2006
"... The objects in many real-world domains can be organized into hierarchies, where each internal node picks out a category of objects. Given a collection of features and relations defined over a set of objects, an annotated hierarchy includes a specification of the categories that are most useful for d ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The objects in many real-world domains can be organized into hierarchies, where each internal node picks out a category of objects. Given a collection of features and relations defined over a set of objects, an annotated hierarchy includes a specification of the categories that are most useful for describing each individual feature and relation. We define a generative model for annotated hierarchies and the features and relations that they describe, and develop a Markov chain Monte Carlo scheme for learning annotated hierarchies. We show that our model discovers interpretable structure in several real-world data sets. 1
A Latent Mixed Membership Model for Relational Data
- IN LINKKDD ’05: PROCEEDINGS OF THE 3RD INTERNATIONAL WORKSHOP ON LINK DISCOVERY
, 2005
"... ... data analysis and machine learning. In this paper we propose a Bayesian model that uses a hierarchy of probabilistic assumptions about the way objects interact with one another in order to learn latent groups, their typical interaction patterns, and the degree of membership of objects to groups. ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
... data analysis and machine learning. In this paper we propose a Bayesian model that uses a hierarchy of probabilistic assumptions about the way objects interact with one another in order to learn latent groups, their typical interaction patterns, and the degree of membership of objects to groups. Our model explains the data using a small set of parameters that can be reliably estimated with an e#cient inference algorithm. In our approach, the set of probabilistic assumptions may be tailored to a specific application domain in order to incorporate intuitions and/or semantics of interest. We demonstrate our methods on simulated data and we successfully apply our model to a data set of protein-to-protein interactions.

