Results 1 - 10
of
36
The link-prediction problem for social networks
- J. American Society for Information Science and Technology
"... Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the link-prediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a ne ..."
Abstract
-
Cited by 269 (4 self)
- Add to MetaCart
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the link-prediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large co-authorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures. 1
Statistical Relational Learning for Document Mining
, 2003
"... A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored in relational databases. Potential features are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting where scientific papers will be published based on relational data taken from CiteSeer. This data includes word counts in the document, frequently cited authors or papers, co-citations, publication venues of cited papers, word co-occurrences, and word counts in cited or citing documents. Our approach results in classification accuracies superior to those achieved when using classical "flat" features. Our classification task also serves as a "where to publish?" conference/journal recommendation task.
Link Mining: A Survey
- SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-fertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
Cluster-based Concept Invention for Statistical Relational Learning
- Proceedings of the 10th SIGKDD
, 2004
"... We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the crea ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the creation of new features. For example, in CiteSeer, papers can be clustered based on words or citations giving "topics", and authors can be clustered based on documents they coauthor giving "communities". Such cluster-derived concepts become part of more complex feature expressions. Out of the large number of generated features, those which improve predictive accuracy are kept in the model, as decided by statistical feature selection criteria. We present results demonstrating improved accuracy on two tasks, venue prediction and link prediction, using CiteSeer data.
Structural Logistic Regression for Link Analysis
, 2003
"... We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a "flat" file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content.
Link prediction approach to collaborative filtering
- In Proceedings of the Joint Conference on Digital Libraries (JCDL05). ACM
, 2005
"... Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most commonlyused and successful recommendation algorithms is collaborative filtering, which explores the correlations within ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most commonlyused and successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms to explore transitive user-item associations. In this paper, we extend the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms.
Supervised Random Walks: Predicting and Recommending Links in Social Networks
"... Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Althoug ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-theart unsupervised approaches as well as approaches that are based on feature extraction.
Combining collective classification and link prediction
- In Workshop on Mining Graphs and Complex Structures at the IEEE International Conference on Data Mining
, 2007
"... The problems of object classification (labeling the nodes of a graph) and link prediction (predicting the links in a graph) have been largely studied independently. Commonly, object classification is performed assuming a complete set of known links and link prediction is done assuming a fully observ ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The problems of object classification (labeling the nodes of a graph) and link prediction (predicting the links in a graph) have been largely studied independently. Commonly, object classification is performed assuming a complete set of known links and link prediction is done assuming a fully observed set of node attributes. In most real world domains, however, attributes and links are often missing or incorrect. Object classification is not provided with all the links relevant to correct classification and link prediction is not provided all the labels needed for accurate link prediction. In this paper, we propose an approach that addresses these two problems by interleaving object classification and link prediction in a collective algorithm. We investigate empirically the conditions under which an integrated approach to object classification and link prediction improves performance, and find that performance improves over a wide range of network types, and algorithm settings. 1.
An Algorithmic Approach to Social Networks
- PhD thesis at MIT References 118 Science and Artificial Intelligence Laboratory
, 2005
"... ..."
The time-series link prediction problem with applications in communication surveillance
- INFORMS Journal on Computing
, 2009
"... The ability to predict linkages among data objects is central to many data mining tasks, such as product recommendation and social network analysis. A substantial literature has been devoted to the link prediction problem either as an implicitly embedded problem in specific applications or as a gene ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The ability to predict linkages among data objects is central to many data mining tasks, such as product recommendation and social network analysis. A substantial literature has been devoted to the link prediction problem either as an implicitly embedded problem in specific applications or as a generic data mining task. This literature has mostly adopted a static graph representation where a snapshot of the network is analyzed to predict hidden or future links. However, this representation is only appropriate to investigate whether certain link will ever occur or not and does not apply to many applications for which the prediction of the repeated link occurrences are of main interest (e.g., communication network surveillance). In this paper, we introduce the time series link prediction problem, taking into consideration temporal evolutions of link occurrences to predict link occurrence probabilities at a particular time. Using the Enron email data and highenergy particle physics literature coauthorship data we have demonstrated that time series models of single link occurrences achieved comparable link prediction performance with commonly used static graph link prediction algorithms. Furthermore, combination of static graph link prediction algorithms and time series model produced significantly improved predictions than static graph link prediction methods, demonstrating the great potential of integrated methods that exploit both inter-link structural dependencies and intra-link temporal dependencies. Key words: analysis of algorithms; communication networks; link prediction; statistical analysis; time series analysis. 1.

