Results 1  10
of
64
L.S.: Learning optimal ranking with tensor factorization for tag recommendation
 In: KDD ’09: Proceeding of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2009
"... Tag recommendation is the task of predicting a personalized list of tags for a user given an item. This is important for many websites with tagging capabilities like last.fm or delicious. In this paper, we propose a method for tag recommendation based on tensor factorization (TF). In contrast to oth ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
(Show Context)
Tag recommendation is the task of predicting a personalized list of tags for a user given an item. This is important for many websites with tagging capabilities like last.fm or delicious. In this paper, we propose a method for tag recommendation based on tensor factorization (TF). In contrast to other TF methods like higher order singular value decomposition (HOSVD), our method RTF (‘ranking with tensor factorization’) directly optimizes the factorization model for the best personalized ranking. RTF handles missing values and learns from pairwise ranking constraints. Our optimization criterion for TF is motivated by a detailed analysis of the problem and of interpretation schemes for the observed data in tagging systems. In all, RTF directly optimizes for the actual problem using a correct interpretation of the data. We provide a gradient descent algorithm to solve our optimization problem. We also provide an improved learning and prediction method with runtime complexity analysis for RTF. The prediction runtime of RTF is independent of the number of observations and only depends on the factorization dimensions. Besides the theoretical analysis, we empirically show that our method outperforms other stateoftheart tag recommendation methods like FolkRank, PageRank and HOSVD both in quality and prediction runtime.
HADI: Mining radii of large graphs
 ACM Transactions on Knowledge Discovery from Data
, 2010
"... Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius p ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
(Show Context)
Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and finetuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scaleup on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.
A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis
"... Abstract—Social Tagging is the process by which many users add metadata in the form of keywords, to annotate and categorize items (songs, pictures, web links, products, etc.). Social tagging systems (STSs) can provide three different types of recommendations: They can recommend 1) tags to users, bas ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Social Tagging is the process by which many users add metadata in the form of keywords, to annotate and categorize items (songs, pictures, web links, products, etc.). Social tagging systems (STSs) can provide three different types of recommendations: They can recommend 1) tags to users, based on what tags other users have used for the same items, 2) items to users, based on tags they have in common with other similar users, and 3) users with common social interest, based on common tags on similar items. However, users may have different interests for an item, and items may have multiple facets. In contrast to the current recommendation algorithms, our approach develops a unified framework to model the three types of entities that exist in a social tagging system: users, items, and tags. These data are modeled by a 3order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the KernelSVD smoothing technique. We perform experimental comparison of the proposed method against stateoftheart recommendation algorithms with two real data sets (Last.fm and BibSonomy). Our results show significant improvements in terms of effectiveness measured through recall/precision. Index Terms—Social tags, recommender systems, tensors, HOSVD. Ç
Radius Plots for Mining Terabyte Scale Graphs: Algorithms, Patterns, and Observations
"... Given large, multimillion node graphs (e.g., FaceBook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot ..."
Abstract

Cited by 22 (16 self)
 Add to MetaCart
(Show Context)
Given large, multimillion node graphs (e.g., FaceBook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and finetuned algorithm to compute the diameter of massive graphs, that runs on the top of the HADOOP /MAPREDUCE system, with excellent scaleup on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius Plot, and its palindrome motion over time. 1
GigaTensor: Scaling Tensor Analysis Up By 100 Times Algorithms and Discoveries
"... Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web graphs, sensor streams (time, location, and type), social networks over time, and DBLP conferenceauthorkeyword relations ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web graphs, sensor streams (time, location, and type), social networks over time, and DBLP conferenceauthorkeyword relations. Tensor decomposition is an important data mining tool with various applications including clustering, trend detection, and anomaly detection. However, current tensor decomposition algorithms are not scalable for large tensors with billions of sizes and hundreds millions of nonzeros: the largest tensor in the literature remains thousands of sizes and hundreds thousands of nonzeros. Consider a knowledge base tensor consisting of about 26 million nounphrases. The intermediate data explosion problem, associated with naive implementations of tensor decomposition algorithms, would require the materialization and the storage of a matrix whose largest dimension would be ≈ 7·10 14; this amounts to ∼ 10 Petabytes, or equivalently a few data centers worth of storage, thereby rendering the tensor analysis of this knowledge base, in the naive way, practically impossible. In this paper, we propose GIGATENSOR, a scalable distributed algorithm for large scale tensor decomposition. GIGATENSOR exploits the sparseness of the real world tensors, and avoids the intermediate data explosion problem by carefully redesigning the tensor decomposition algorithm. Extensive experiments show that our proposed GIGATENSOR solves 100 × bigger problems than existing methods. Furthermore, we employ GIGATENSOR in order to analyze a very large real world, knowledge base tensor and present our astounding findings which include discovery of potential synonyms among millions of nounphrases (e.g. the noun ‘pollutant ’ and the nounphrase ‘greenhouse gases’).
Multivis: Contentbased social network exploration through multiway visual analysis
 In SDM
, 2009
"... With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To address both aspects of the scalability challenge, we present a hybrid approach that leverages two complementary disciplines, data mining and information visualization. In particular, we propose 1) an analytic data model for contentbased networks using tensors; 2) an efficient highorder clustering framework for analyzing the data; 3) a scalable contextsensitive graph visualization to present the clusters. We evaluate the proposed methods using both synthetic and real datasets. In terms of computational efficiency, the proposed methods are an order of magnitude faster compared to the baseline. In terms of effectiveness, we present several case studies of real corporate social networks. 1
A Classification for Community Discovery Methods in Complex Networks
, 2011
"... Many realworld networks are intimately organized according to a community structure. Much research effort has been devoted to develop methods and algorithms that can efficiently highlight this hidden structure of a network, yielding a vast literature on what is called today community detection. S ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
Many realworld networks are intimately organized according to a community structure. Much research effort has been devoted to develop methods and algorithms that can efficiently highlight this hidden structure of a network, yielding a vast literature on what is called today community detection. Since network representation can be very complex and can contain different variants in the traditional graph model, each algorithm in the literature focuses on some of these properties and establishes, explicitly or implicitly, its own definition of community. According to this definition, each proposed algorithm then extracts the communities, which typically reflect only part of the features of real communities. The aim of this survey is to provide a ‘user manual’ for the community discovery problem. Given a meta definition of what a community in a social network is, our aim is to organize the main categories of community discovery methods based on the definition of community they adopt. Given a desired definition of community and the features of a problem (size of network, direction of edges, multidimensionality, and so on) this review paper is designed to provide a set of approaches that researchers could focus on. The proposed classification of community discovery methods is also useful for putting into perspective the many open
MultiWay Compressed Sensing for Sparse LowRank Tensors
, 2012
"... For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements—in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, bu ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements—in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, but no results are available for multilinear models of tensor data. In this contribution, we consider compressed sensing for sparse and lowrank tensors. More specifically, we consider lowrank tensors synthesized as sums of outer products of sparse loading vectors, and a special class of linear dimensionalityreducing transformations that reduce each mode individually. We prove interesting “oracle ” properties showing that it is possible to identify the uncompressed sparse loadings directly from the compressed tensor data. The proofs naturally suggest a twostep recovery process: fitting a lowrank model in compressed domain, followed by permode decompression. This twostep process is also appealing from a computational complexity and memory capacity point of view, especially for big tensor datasets.
A Tensorbased Factorization Model of Semantic Compositionality
"... In this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multiway interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this paper, we present a novel method for the computation of compositionality within a distributional framework. The key idea is that compositionality is modeled as a multiway interaction between latent factors, which are automatically constructed from corpus data. We use our method to model the composition of subject verb object triples. The method consists of two steps. First, we compute a latent factor model for nouns from standard cooccurrence data. Next, the latent factors are used to induce a latent model of threeway subject verb object interactions. Our model has been evaluated on a similarity task for transitive phrases, in which it exceeds the state of the art. 1
CrossTagging for Personalized Open Social Networking
 CONFERENCE ON HYPERTEXT AND HYPERMEDIA
, 2009
"... The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, bookmarking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social aspect, i.e., users can easily share information with other users. But ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The Social Web is successfully established and poised for continued growth. Web 2.0 applications such as blogs, bookmarking, music, photo and video sharing systems are among the most popular; and all of them incorporate a social aspect, i.e., users can easily share information with other users. But due to the diversity of these applications – serving different aims – the Social Web is ironically divided. Blog users who write about music for example, could possibly benefit from other users registered in other social systems operating within the same domain, such as a social radio station. Although these sites are two different and disconnected systems, offering distinct services to the users, the fact that domains are compatible could benefit users from both systems with interesting and multifaceted information. In this paper we propose to automatically establish social links between distinct social systems through crosstagging, i.e., enriching a social system with the tags of other similar social system(s). Since tags are known for increasing the prediction quality of recommender systems (RS), we propose to quantitatively evaluate the extent to which users can benefit from crosstagging by measuring the impact of different crosstagging approaches on tagaware RS for personalized resource recommendations. We conduct experiments in real world data sets and empirically show the effectiveness of our approaches.