Results 1 -
8 of
8
Joint Latent Topic Models for Text and Citations
, 2008
"... In this work, we address the problem of joint modeling of text and citations in the topic modeling framework. We present two different models called the Pairwise-Link-LDA and the Link-LDA-PLSA models. The Pairwise-Link-LDA model combines the ideas of LDA [4] and Mixed Membership Block Stochastic Mod ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
In this work, we address the problem of joint modeling of text and citations in the topic modeling framework. We present two different models called the Pairwise-Link-LDA and the Link-LDA-PLSA models. The Pairwise-Link-LDA model combines the ideas of LDA [4] and Mixed Membership Block Stochastic Models [1] and allows modeling arbitrary link structure. However, the model is computationally expensive, since it involves modeling the presence or absence of a citation (link) between every pair of documents. The second model solves this problem by assuming that the link structure is a bipartite graph. As the name indicates, Link-PLSA-LDA model combines the LDA and PLSA models into a single graphical model. Our experiments on a subset of Citeseer data show that both these models are able to predict unseen data better than the baseline model of Erosheva and Lafferty [8], by capturing the notion of topical similarity between the contents of the cited and citing documents. Our experiments on two different data sets on the link prediction task show that the Link-PLSA-LDA model performs the best on the citation prediction task, while also remaining highly scalable. In addition, we also present some interesting visualizations generated by each of the models.
Who Should I Cite? Learning Literature Search Models from Citation Behavior ABSTRACT
"... Scientists depend on literature search to find prior work that is relevant to their research ideas. We introduce a retrieval model for literature search that incorporates a wide variety of factors important to researchers, and learns the weights of each of these factors by observing citation pattern ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Scientists depend on literature search to find prior work that is relevant to their research ideas. We introduce a retrieval model for literature search that incorporates a wide variety of factors important to researchers, and learns the weights of each of these factors by observing citation patterns. We introduce features like topical similarity and author behavioral patterns, and combine these with features from related work like citation count and recency of publication. We present an iterative process for learning weights for these features that alternates between retrieving articles with the current retrieval model, and updating model weights by training a supervised classifier on these articles. We propose a new task for evaluating the resulting retrieval models, where the retrieval system takes only an abstract as its input and must produce as output the list of references at the end of the abstract’s article. We evaluate our model on a collection of journal, conference and workshop articles from the ACL Anthology Reference Corpus. Our model achieves a mean average precision of 28.7, a 12.8 point improvement over a term similarity baseline, and a significant improvement both over models using only features from related work and over models without our iterative learning.
Dynamic egocentric models for citation networks
- In Proc. 28th Intl. Conf. on Machine Learning
, 2011
"... The analysis of the formation and evolution of networks over time is of fundamental importance to social science, biology, and many other fields. While longitudinal network data sets are increasingly being recorded at the granularity of individual time-stamped events, most studies only focus on coll ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The analysis of the formation and evolution of networks over time is of fundamental importance to social science, biology, and many other fields. While longitudinal network data sets are increasingly being recorded at the granularity of individual time-stamped events, most studies only focus on collapsed cross-sectional snapshots of the network. In this paper, we introduce a dynamic egocentric framework that models continuous-time network data using multivariate counting processes. For inference, an efficient partial likelihood approach is used, allowing our methods to scale to large networks. We apply our techniques to various citation networks and demonstrate the predictive power and interpretability of the learned statistical models. 1.
A Discriminative Approach to Topic-based Citation Recommendation ⋆
"... Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of topic-based citation recommendation and proposes a discriminative approach to this problem. Specifically, it proposes a two-layer Restricted Boltzmann Machine model, called RBM-CS, which can discover topic distributions of paper content and citation relationship simultaneously. Experimental results demonstrate that RBM-CS can significantly outperform baseline methods for citation recommendation. 1
Using Terms from Citations for IR: Some First Results
- In Proc. of the 29th European Conference on Information Retrieval (ECIR 2007
, 2007
"... Abstract. We present the results of experiments using terms from citations for scientific literature search. To index a given document, we use terms used by citing documents to describe that document, in combination with terms from the document itself. We find that the combination of terms gives bet ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. We present the results of experiments using terms from citations for scientific literature search. To index a given document, we use terms used by citing documents to describe that document, in combination with terms from the document itself. We find that the combination of terms gives better retrieval performance than standard indexing of the document terms alone and present a brief analysis of our results. This paper marks the first experimental results from a new test collection of scientific papers, created by us in order to study citation-based methods for IR. 1
Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base
"... This paper presents a framework and methodology to improve the search experience in digital library systems. The approach taken is to cluster a textual knowledgebase along multiple relations and return search results in the form of small, focused clusters. Specifically, we generate multiple relation ..."
Abstract
- Add to MetaCart
This paper presents a framework and methodology to improve the search experience in digital library systems. The approach taken is to cluster a textual knowledgebase along multiple relations and return search results in the form of small, focused clusters. Specifically, we generate multiple relationship networks, one per relationship type, and then cluster these networks. At search time, we present a ranked set of clusters—one ranking per relationship type. The intuition for this approach is that returning clusters of contextually related information provides users with a situational and contextual awareness of the search results rather than returning a ranked list of only those documents that match the query. We address the use of both implicit (such as textual content) and explicit (such as citations, authors etc.) relations between documents. The primary question we focus on is how to rank the clusters, given a search query. We explore two approaches: a text-based rank (using the text‘s similarity to the user‘s query) and a social network-based rank (using information centrality). A comparison of these two ranking methods suggest that using information centrality for ranking is very useful for ranking clusters and its documents because the documents that characterize that cluster get the highest rank. 1.
General Terms
"... We present an adversarial information retrieval approach to the automatic detection of spam content in social bookmarking websites. Our approach is based on the intuitive notion that similar users and posts use similar language. We detect malicious users on the basis of a similarity function that ad ..."
Abstract
- Add to MetaCart
We present an adversarial information retrieval approach to the automatic detection of spam content in social bookmarking websites. Our approach is based on the intuitive notion that similar users and posts use similar language. We detect malicious users on the basis of a similarity function that adopts language modeling at two different levels of granularity: at the level of individual posts, and at an aggregated user level, where all posts of one user are merged into a single profile. We evaluate our approach on two spam-annotated data sets representing snapshots of the social bookmarking websites CiteULike and BibSonomy. We find that our approach achieves promising results across data sets, with AUC scores ranging from 0.92 to 0.96.
Using Co-views Information to Learn Lecture Recommendations
"... Abstract. Content-based methods are commonly adopted for addressing the cold-start problem in recommender systems. In the cold-start scenario, usage information regarding an item and/or item preference information of a user is unavailable since the item or the user is new in the system. Thus collabo ..."
Abstract
- Add to MetaCart
Abstract. Content-based methods are commonly adopted for addressing the cold-start problem in recommender systems. In the cold-start scenario, usage information regarding an item and/or item preference information of a user is unavailable since the item or the user is new in the system. Thus collaborative filtering strategies cannot be employed but instead item-specific attributes or the user profile information are used to make recommendations. We focus on lecture recommendations for the data in videolectures.net that was made available as part of the ECML/PKDD Discovery Challenge. We propose the use of co-view information based on previously seen lecture pairs for learning the weights of lecture attributes for ranking lectures for the cold-start recommendation task. Co-viewed triplet and pair information is also used to estimate the probability that a lecture would be seen, given a set of previously seen lectures. Our results corroborate the effectiveness of using co-view information in learning lecture recommendations. 1

