Evaluating collaborative filtering recommender systems
 ACM Transactions on Information Systems
, 2004
Abstract

Cited by 570 (13 self)
© ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM
Maximizing the Spread of Influence Through a Social Network
 In KDD
, 2003
"... Models for the processes by which ideas and influence propagate through a social network have been studied in a number of domains, including the diffusion of medical and technological innovations, the sudden and widespread adoption of various strategies in gametheoretic settings, and the effects of ..."
Abstract

Cited by 471 (6 self)
Models for the processes by which ideas and influence propagate through a social network have been studied in a number of domains, including the diffusion of medical and technological innovations, the sudden and widespread adoption of various strategies in gametheoretic settings, and the effects of “word of mouth ” in the promotion of new products. Recently, motivated by the design of viral marketing strategies, Domingos and Richardson posed a fundamental algorithmic problem for such social network processes: if we can try to convince a subset of individuals to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target? We consider this problem in several of the most widely studied models in social network analysis. The optimization problem of selecting the most influential nodes is NPhard here, and we provide the first provable approximation guarantees for efficient algorithms. Using an analysis framework based on submodular functions, we show that a natural greedy strategy obtains a solution that is provably within 63 % of optimal for several classes of models; our framework suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks. We also provide computational experiments on large collaboration networks, showing that in addition to their provable guarantees, our approximation algorithms significantly outperform nodeselection heuristics based on the wellstudied notions of degree centrality and distance centrality from the field of social networks.
Mining KnowledgeSharing Sites for Viral Marketing
, 2002
"... Viral marketing takes advantage of networks of influence among customers to inexpensively achieve large changes in behavior. Our research seeks to put it on a firmer footing by mining these networks from data, building probabilistic models of them, and using these models to choose the best viral mar ..."
Abstract

Cited by 210 (8 self)
Viral marketing takes advantage of networks of influence among customers to inexpensively achieve large changes in behavior. Our research seeks to put it on a firmer footing by mining these networks from data, building probabilistic models of them, and using these models to choose the best viral marketing plan. Knowledgesharing sites, where customers review products and advise each other, are a fertile source for this type of data mining. In this paper we extend our previous techniques, achieving a large reduction in computational cost, and apply them to data from a knowledgesharing site. We optimize the amount of marketing funds spent on each customer, rather than just making a binary decision on whether to market to him. We take into account the fact that knowledge of the network is partial, and that gathering that knowledge can itself have a cost. Our results show the robustness and utility of our approach.
Trust management for the semantic web
 In ISWC
, 2003
"... Abstract. Though research on the Semantic Web has progressed at a steady pace, its promise has yet to be realized. One major difficulty is that, by its very nature, the Semantic Web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give ea ..."
Abstract

Cited by 195 (3 self)
Abstract. Though research on the Semantic Web has progressed at a steady pace, its promise has yet to be realized. One major difficulty is that, by its very nature, the Semantic Web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each source. We cannot expect each user to know the trustworthiness of each source, nor would we want to assign topdown or global credibility values due to the subjective nature of trust. We tackle this problem by employing a web of trust, in which each user provides personal trust values for a small number of other users. We compose these trusts to compute the trust a user should place in any other user in the network. A user is not assigned a single trust rank. Instead, different users may have different trust values for the same user. We define properties for combination functions which merge such trusts, and define a class of functions for which merging may be done locally while maintaining these properties. We give examples of specific functions and apply them to data from Epinions and our BibServ bibliography server. Experiments confirm that the methods are robust to noise, and do not put unreasonable expectations on users. We hope that these methods will help move the Semantic Web closer to fulfilling its promise. 1.
Measuring user influence in Twitter: The million follower fallacy
 in ICWSM ’10: Proceedings of international AAAI Conference on Weblogs and Social
, 2010
"... Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user’s influence on others—a concept that is crucial in sociology ..."
Abstract

Cited by 153 (14 self)
Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user’s influence on others—a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twitter, we present an indepth comparison of three measures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynamics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spontaneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.
Why Collective Inference Improves Relational Classification
 In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial tr ..."
Abstract

Cited by 111 (24 self)
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that collective inference can significantly reduce classification error when compared with traditional inference techniques. We investigate the underlying mechanisms for this error reduction by reviewing past work on collective inference and characterizing different types of statistical models used for making inference in relational data. We show important differences among these models, and we characterize the necessary and sufficient conditions for reduced classification error based on experiments with real and simulated data.
ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs
 NTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2002
"... Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can ..."
Abstract

Cited by 94 (19 self)
Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graphrepresented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memoryefficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highlyaccurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drilldown tasks and to repeatedly answer questions about changing data sources. To enable this drilldown, ANF employs new techniques for approximating neighbourhoodtype functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.
Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint
 In SRDS
, 2003
"... Abstract How will a virus propagate in a real network?Does an epidemic threshold exist for a finite powerlaw graph, or any finite graph? How long does ittake to disinfect a network given particular values of infection rate and virus death rate? We answer the first question by providing equations th ..."
Abstract

Cited by 79 (18 self)
Abstract How will a virus propagate in a real network?Does an epidemic threshold exist for a finite powerlaw graph, or any finite graph? How long does ittake to disinfect a network given particular values of infection rate and virus death rate? We answer the first question by providing equations that accurately model virus propagation in any network including real and synthesized networkgraphs. We propose a general epidemic threshold condition that applies to arbitrary graphs: weprove that, under reasonable approximations, the epidemic threshold for a network is closely relatedto the largest eigenvalue of its adjacency matrix. Finally, for the last question, we show that infections tend to zero exponentially below the epidemic threshold. We show that our epidemic threshold modelsubsumes many known thresholds for specialcase graphs (e.g., Erd"osR'enyi, BA powerlaw, homogeneous); we show that the threshold tends to zero for infinite powerlaw graphs. Finally, we illustrate thepredictive power of our model with extensive experiments on real and synthesized graphs. We show thatour threshold condition holds for arbitrary graphs.
Influential Nodes in a Diffusion Model for Social Networks
 IN ICALP
, 2005
"... We study the problem of maximizing the expected spread of an innovation or behavior within a social network, in the presence of "wordofmouth" referral. Our work builds on the observation that individuals' decisions to purchase a product or adopt an innovation are strongly influenced by recomme ..."
Abstract

Cited by 77 (2 self)
We study the problem of maximizing the expected spread of an innovation or behavior within a social network, in the presence of "wordofmouth" referral. Our work builds on the observation that individuals' decisions to purchase a product or adopt an innovation are strongly influenced by recommendations from their friends and acquaintances. Understanding
Graph mining: Laws, generators, and algorithms
 ACM COMPUTING SURVEYS
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation i ..."
Abstract

Cited by 72 (7 self)
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: "How can we generate synthetic but realistic graphs?" To answer this, we must first understand what patterns are common in realworld graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.