## Modeling Information Diffusion in Implicit Networks

Citations: | 28 - 0 self |

### BibTeX

@MISC{Yang_modelinginformation,

author = {Jaewon Yang and Jure Leskovec},

title = {Modeling Information Diffusion in Implicit Networks},

year = {}

}

### OpenURL

### Abstract

Abstract—Social media forms a central domain for the production and dissemination of real-time information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions among numerous participants. Here we develop a Linear Influence Model where rather than requiring the knowledge of the social network and then modeling the diffusion by predicting which node will influence which other nodes in the network, we focus on modeling the global influence of a node on the rate of diffusion through the (implicit) network. We model the number of newly infected nodes as a function of which other nodes got infected in the past. For each node we estimate an influence function that quantifies how many subsequent infections can be attributed to the influence of that node over time. A nonparametric formulation of the model leads to a simple least squares problem that can be solved on large datasets. We validate our model on a set of 500 million tweets and a set of 170 million news articles and blog posts. We show that the Linear Influence Model accurately models influences of nodes and reliably predicts the temporal dynamics of information diffusion. We find that patterns of influence of individual participants differ significantly depending on the type of the node and the topic of the information. I.

### Citations

1620 |
Time Series Analysis Forecasting and Control
- Box, Jenkins
- 1976
(Show Context)
Citation Context ...ion for the volume at the next time, ˆ Vk(t+1) = Vk(t). We also consider two standard time series regression methods: the Autoregressive Model (AR), and the Autoregressive Moving Average Model (ARMA) =-=[5]-=- both of order L. The AR model is equivalent to a special case of LIM where we assume that all the nodes have the same influence function. ARMA uses AR with an additional ingredient, the moving averag... |

1246 |
Diffusion of Innovations
- Rogers
- 1995
(Show Context)
Citation Context ...nd blogs or employing their social networks. Thus, even though flows of information and influence have traditionally been thought of as diffusion processes over underlying social networks [13], [15], =-=[29]-=-, [31] existing models and formulations may be too constrained to capture the complexity of the underlying phenomena. Modeling diffusion and temporal variation. Here we address the above issues by dev... |

498 |
regression: Biased estimation for nonorthogonal problems
- Hoerl, Kennard, et al.
- 1970
(Show Context)
Citation Context ...culation. We use the Reflective Newton Method [9] which takes less than a second to solve a problem with K = 1,000, L = 10, T = 120, and N = 100. In practice we also apply the Tikhonov regularization =-=[19]-=-, which has the effect of smoothing the non-parametric estimates. Extensions: Accounting for novelty. So far, we have assumed that a node has the same influence regardless of how early or late in the ... |

323 |
Threshold models of collective behavior
- Granovetter
- 1978
(Show Context)
Citation Context ...ll remains. Traditionally, models of diffusion and cascading behavior have formalized the spread of ideas, information and influence as processes taking place on social and information networks [13], =-=[15]-=-, [31], where each individual node is either active (infected, influenced) or inactive, and active nodes can then spread the contagion (information, influence, disease) along the edges of the underlyi... |

238 |
The Mathematical Theory of Infectious Diseases and its Applications
- Bailey
- 1975
(Show Context)
Citation Context ...any subsequent purchases a node influences) is of considerable interest. Similarly, in epidemiology and virus propagation, we observe people getting sick without usually knowing how they got infected =-=[4]-=-. Here our model allows us to estimate the number of subsequent infections produced by each node without the knowledge of the network. II. PROPOSED METHOD Next we formally introduce the Linear Influen... |

238 | The dynamics of viral marketing
- Leskovec, Adamic, et al.
- 2007
(Show Context)
Citation Context ...diffusion in social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, such as recommendations =-=[25]-=-, links [27], [28], tags [8], [7], topics [3], phrases or “memes” [26], that spread and propagate through networks. Even if one does obtain large scale real-world diffusion data, however, the issue of... |

153 | Measuring User Influence in Twitter: The Million Follower Fallacy
- Cha, Haddadi, et al.
- 2010
(Show Context)
Citation Context ...rk structure, and flow over time opens interesting questions about the large-scale behavior in information networks. Even though the diffusion of information has been an active research area recently =-=[7]-=-, [12], [26], [28], modeling the diffusion in social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the e... |

150 |
Meme-tracking and the dynamics of the news cycle
- Leskovec, Backstrom, et al.
- 2009
(Show Context)
Citation Context ...e, and flow over time opens interesting questions about the large-scale behavior in information networks. Even though the diffusion of information has been an active research area recently [7], [12], =-=[26]-=-, [28], modeling the diffusion in social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, suc... |

145 |
A simple model of global cascades on random networks
- Watts
- 2002
(Show Context)
Citation Context ...ains. Traditionally, models of diffusion and cascading behavior have formalized the spread of ideas, information and influence as processes taking place on social and information networks [13], [15], =-=[31]-=-, where each individual node is either active (infected, influenced) or inactive, and active nodes can then spread the contagion (information, influence, disease) along the edges of the underlying net... |

134 |
E.: Talk of the network: A complex systems look at the underlying process of word-of-mouth
- Goldenberg, Libai, et al.
(Show Context)
Citation Context ...ss still remains. Traditionally, models of diffusion and cascading behavior have formalized the spread of ideas, information and influence as processes taking place on social and information networks =-=[13]-=-, [15], [31], where each individual node is either active (infected, influenced) or inactive, and active nodes can then spread the contagion (information, influence, disease) along the edges of the un... |

102 | Algorithm AS 136: A k-means clustering algorithm - Hartigan, Wong - 1979 |

96 |
Social ties and word-of-mouth referral behavior
- Brown, Reingen
- 1987
(Show Context)
Citation Context ...number of nodes got infected by the contagion we model the influence of these nodes on the overall volume and the temporal dynamics of the diffusion. This setting naturally applies to viral marketing =-=[6]-=-, [18], where we observe people purchasing products or adopting particular behavior without explicitly knowing who was the influencer. Thus, for viral marketing, estimating the influence functions (i.... |

82 |
Persanal Influence: The Part Played by People in the Flow of Mass Communications
- Katz, Lazarsfeld
- 1955
(Show Context)
Citation Context ...he news cycle takes [23], [16]. For example, some websites may act as “influentials” or early adopters [32]. Bloggers and mainstream media are pushing new content into the system in different manners =-=[22]-=-, [11], and often the content generated by blogs is regarded to be more credible than that from the mainstream media [21]. In this paper we aim to develop an understanding of the mechanisms by which t... |

78 | Cascading Behavior in Large Blog Graphs
- Leskovec, McGlohon, et al.
- 2007
(Show Context)
Citation Context ... social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, such as recommendations [25], links =-=[27]-=-, [28], tags [8], [7], topics [3], phrases or “memes” [26], that spread and propagate through networks. Even if one does obtain large scale real-world diffusion data, however, the issue of modeling th... |

77 |
Tracking information epidemics in blogspace
- Adar, Adamic
- 2005
(Show Context)
Citation Context ...a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, such as recommendations [25], links [27], [28], tags [8], [7], topics =-=[3]-=-, phrases or “memes” [26], that spread and propagate through networks. Even if one does obtain large scale real-world diffusion data, however, the issue of modeling the underlying process still remain... |

69 | Network-based marketing: Identifying likely adopters
- Hill, Provost, et al.
(Show Context)
Citation Context ...r of nodes got infected by the contagion we model the influence of these nodes on the overall volume and the temporal dynamics of the diffusion. This setting naturally applies to viral marketing [6], =-=[18]-=-, where we observe people purchasing products or adopting particular behavior without explicitly knowing who was the influencer. Thus, for viral marketing, estimating the influence functions (i.e., ho... |

68 | A Measurement-driven Analysis of Information Propagation in the Flickr Social Network
- Cha, Mislove, et al.
- 2009
(Show Context)
Citation Context ...has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, such as recommendations [25], links [27], [28], tags =-=[8]-=-, [7], topics [3], phrases or “memes” [26], that spread and propagate through networks. Even if one does obtain large scale real-world diffusion data, however, the issue of modeling the underlying pro... |

64 |
A Structural Theory of Social Influence
- Friedkin
- 2006
(Show Context)
Citation Context ...pants in the dynamics of diffusion? Linear Influence Model (LIM). We consider the temporal variation in a diffusion-based framework and build on the view adopted by the literature on social influence =-=[10]-=-, [20]. We formulate the Linear Influence Model (LIM) by starting with the assumption that the number of newly infected nodes depends on which other nodes got infected in the past. We then model the n... |

58 | Learning influence probabilities in social networks
- Goyal, Bonchi, et al.
- 2010
(Show Context)
Citation Context ...e to the heterogeneity of the nodes and data sparsity. Only recently has the availability of large social network and corresponding diffusion data made it possible to estimate such models in practice =-=[14]-=-, [30]. When using such models and fitting them to real-world data one makes several assumptions: (a) complete network data is available, (b) contagion can only spread over the edges of the underlying... |

56 | Novelty and collective attention
- WU, HUBERMAN
- 2007
(Show Context)
Citation Context ...n very early or very late. However, nodes are more likely to adopt novel and recent information while ignoring old and obsolete information. In order to account for this effect of recency and novelty =-=[33]-=- we introduce a multiplicative factor α(t) that models how much more/less influential a node is at the time when it mentions the information. We refer to this model as α-LIM: u=N ∑ Vk(t+1) = α(t) u=1 ... |

55 | Influentials, networks, and public opinion formation
- Watts, Dodds
(Show Context)
Citation Context ...rators (or echo chambers), while the mainstream media imparts a dominant force in the direction the news cycle takes [23], [16]. For example, some websites may act as “influentials” or early adopters =-=[32]-=-. Bloggers and mainstream media are pushing new content into the system in different manners [22], [11], and often the content generated by blogs is regarded to be more credible than that from the mai... |

49 | Patterns of temporal variation in online media
- Yang, Leskovec
(Show Context)
Citation Context ...e of nodes. Table I shows the relative reduction in error over the 1time lag predictor on the Memetracker data for all phrases, and also for phrases grouped based on the shape of the volume over time =-=[2]-=-. While AR and ARMA give 7.5% improvement, LIM and its variants outperform AR and ARMA by a factor of two. We find the results to be similar for predicting the adoption of Twitter hashtags (table not ... |

49 | A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables
- Coleman, Li
- 1996
(Show Context)
Citation Context ...olved efficiently even for a large number of nodes and contagions. The sparse nature of the influence indicator matrix M helps to further expedite the calculation. We use the Reflective Newton Method =-=[9]-=- which takes less than a second to solve a problem with K = 1,000, L = 10, T = 120, and N = 100. In practice we also apply the Tikhonov regularization [19], which has the effect of smoothing the non-p... |

44 |
Tracing information flow on a global scale using Internet chain-letter data
- Liben-Nowell, Kleinberg
- 2008
(Show Context)
Citation Context ... flow over time opens interesting questions about the large-scale behavior in information networks. Even though the diffusion of information has been an active research area recently [7], [12], [26], =-=[28]-=-, modeling the diffusion in social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the elements, such as r... |

32 |
Modeling blog dynamics
- Goetz, Leskovec, et al.
- 2009
(Show Context)
Citation Context ...ructure, and flow over time opens interesting questions about the large-scale behavior in information networks. Even though the diffusion of information has been an active research area recently [7], =-=[12]-=-, [26], [28], modeling the diffusion in social networks has proven to be a challenging task. It is difficult to obtain large scale diffusion data and to identify and track on a large scale the element... |

18 |
Information flow modeling based on diffusion rate for prediction and ranking
- Song, Chi, et al.
- 2007
(Show Context)
Citation Context ...he heterogeneity of the nodes and data sparsity. Only recently has the availability of large social network and corresponding diffusion data made it possible to estimate such models in practice [14], =-=[30]-=-. When using such models and fitting them to real-world data one makes several assumptions: (a) complete network data is available, (b) contagion can only spread over the edges of the underlying netwo... |

16 | Naive learning in social networks: Convergence, influence, and the wisdom of crowds
- Golub, Jackson
- 2007
(Show Context)
Citation Context ...in the dynamics of diffusion? Linear Influence Model (LIM). We consider the temporal variation in a diffusion-based framework and build on the view adopted by the literature on social influence [10], =-=[20]-=-. We formulate the Linear Influence Model (LIM) by starting with the assumption that the number of newly infected nodes depends on which other nodes got infected in the past. We then model the number ... |

9 |
Wag the blog: How reliance on traditional media and the Internet influence credibility perceptions of Weblogs among blog users
- Johnson, Kaye
- 2004
(Show Context)
Citation Context ... mainstream media are pushing new content into the system in different manners [22], [11], and often the content generated by blogs is regarded to be more credible than that from the mainstream media =-=[21]-=-. In this paper we aim to develop an understanding of the mechanisms by which the rate of diffusion rises and decays over time. What causes certain information cascades to grow large and why others re... |

2 |
The rumour bomb: Theorising the convergence of new and old trends in mediated U.S. politics
- Harsin
(Show Context)
Citation Context ...es play an amplifying role, blogs can serve both as early detectors and elaborators (or echo chambers), while the mainstream media imparts a dominant force in the direction the news cycle takes [23], =-=[16]-=-. For example, some websites may act as “influentials” or early adopters [32]. Bloggers and mainstream media are pushing new content into the system in different manners [22], [11], and often the cont... |

1 |
How can we measure the influence of the blogosphere? Workshop on the Weblogging Ecosystem
- Gill
- 2004
(Show Context)
Citation Context ...s cycle takes [23], [16]. For example, some websites may act as “influentials” or early adopters [32]. Bloggers and mainstream media are pushing new content into the system in different manners [22], =-=[11]-=-, and often the content generated by blogs is regarded to be more credible than that from the mainstream media [21]. In this paper we aim to develop an understanding of the mechanisms by which the rat... |

1 |
Solving least squares problems. 3rd edition
- Lawson, Hanson
- 1995
(Show Context)
Citation Context ...et. Then we set entries (i,j) of matrix M to 1 for i = kT(t+l) where ||·|| 2 2 denotes the squared Euclidean norm. The above optimization problem is called a non-negative least squares (NNLS) problem =-=[24]-=- and can be solved efficiently even for a large number of nodes and contagions. The sparse nature of the influence indicator matrix M helps to further expedite the calculation. We use the Reflective N... |