Results 1 - 10
of
10
Inferring Networks of Diffusion and Influence
"... Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
Patterns of temporal variation in online media
, 2010
"... Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We stu ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content’s popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on the Web and broaden the understanding of the dynamics of human attention.
Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags, and Complex Contagion on Twitter
"... There is a widespread intuitive sense that different kinds of information spread differently on-line, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on T ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
There is a widespread intuitive sense that different kinds of information spread differently on-line, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on Twitter, analyzing the ways in which tokens known as hashtags spread on a network defined by the interactions among Twitter users. We find significant variation in the ways that widely-used hashtags on different topics spread. Our results show that this variation is not attributable simply to differences in “stickiness, ” the probability of adoption based on one or more exposures, but also to a quantity that could be viewed as a kind of “persistence ” — the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects. We find that hashtags on politically controversial topics are particularly persistent, with repeated exposures continuing to have unusually large marginal effects on adoption; this provides, to our knowledge, the first large-scale validation of the “complex contagion” principle from sociology, which posits that repeated exposures to an idea are particularly crucial when the idea is in some way controversial or contentious. Among other findings, we discover that hashtags representing the natural analogues of Twitter idioms and neologisms are particularly non-persistent, with the effect of multiple exposures decaying rapidly relative to the first exposure. We also study the subgraph structure of the initial adopters for different widely-adopted hashtags, again finding structural differences across topics. We develop simulation-based and generative models to analyze how the adoption dynamics interact with the network structure of the early adopters on which a hashtag spreads.
Rhythm and randomness in human contact
- In Proc. International Conference on Advances in Social Networks Analysis and Mining
, 2010
"... Abstract—There is substantial interest in the effect of human mobility patterns on opportunistic communications. Inspired by recent work revisiting some of the early evidence for a Lévy flight foraging strategy in animals, we analyse datasets on human contact from real world traces. By analysing the ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—There is substantial interest in the effect of human mobility patterns on opportunistic communications. Inspired by recent work revisiting some of the early evidence for a Lévy flight foraging strategy in animals, we analyse datasets on human contact from real world traces. By analysing the distribution of inter-contact times on different time scales and using different graphical forms, we find not only the highly skewed distributions of waiting times highlighted in previous studies but also clear circadian rhythm. The relative visibility of these two components depends strongly on which graphical form is adopted and the range of time scales. We use a simple model to reconstruct the observed behaviour and discuss the implications of this for forwarding efficiency. I.
Meme-tracking and the Dynamics of the News Cycle Jure Leskovec ∗†
"... Tracking new topics, ideas, and “memes ” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suit ..."
Abstract
- Add to MetaCart
Tracking new topics, ideas, and “memes ” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days — the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a meme-tracking approach can provide a coherent representation of the news cycle — the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a “heartbeat”-like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
(3) (4)
, 2011
"... Let’s suppose for simplicity that our Markov chain has only one eigenvector v with eigenvalue 1, so all the other eigenvalues are strictly < 1 in magnitude. As we saw last time, if we put the eigenvalues and their eigenvectors in decreasing order, 1 = λ 1> |λ 2 | ≥ |λ 3 | ≥... ≥ |λ K | (1) then wh ..."
Abstract
- Add to MetaCart
Let’s suppose for simplicity that our Markov chain has only one eigenvector v with eigenvalue 1, so all the other eigenvalues are strictly < 1 in magnitude. As we saw last time, if we put the eigenvalues and their eigenvectors in decreasing order, 1 = λ 1> |λ 2 | ≥ |λ 3 | ≥... ≥ |λ K | (1) then when we evolve some initial distribution p 0 for many time steps, the second largest eigenvalue dominates: or p 0 q t = K∑ i=1 a i v i λ t i v + a 2 v 2 λ t
Inferring Networks of Diffusion and Influence
"... Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typicall ..."
Abstract
- Add to MetaCart
Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or publish the information, observing individual transmissions (who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
Human Dynamics in Large Communication Networks
"... How often humans communicate with each other? What are the mechanisms that explain how human actions are distributed over time? Here we answer these questions by studying the time interval between calls and SMS messages in an anonymized, large mobile network, with 3.1 million users, over 200 million ..."
Abstract
- Add to MetaCart
How often humans communicate with each other? What are the mechanisms that explain how human actions are distributed over time? Here we answer these questions by studying the time interval between calls and SMS messages in an anonymized, large mobile network, with 3.1 million users, over 200 million phone calls and 300 million SMS messages,spanning 70 GigaBytes. Our first contribution is the Truncated Autocatalytic Process (TAP) model, that explains the time between communication events (ie., times between phone-initiations) for a single individual. The novelty is that the model is ’autocatalytic’, in the sense that the parameters of the model change, depending on the latest inter-event time: long periods of inactivity in the past result in long periods of inactivity in the future, and vice-versa. We show that the TAP model mimics the inter-event times of the users of our dataset extremely well, despite its parsimony and simplicity. Our second contribution is the TAP-classifier, a classification method based on the interevent times and in addition to other features. We showed that the inferred sleep intervals and the reciprocity between outgoing and incoming calls are good features to classify users. Finally, analyze the network effects of each class of users and we found surprising results. Moreover, all of our methods are fast, and scale linearly with the number of customers. 1
ABSTRACT 1
, 905
"... The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a m ..."
Abstract
- Add to MetaCart
The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variability across individuals, while remaining simple enough to be interpretable. We show that the model, a cascading non-homogeneous Poisson process, can be formulated as a double-chain hidden Markov model, allowing us to use an efficient inference algorithm to estimate the model parameters from observed data. We then apply this model to two e-mail data sets consisting of 404 and 6,164 users, respectively, that were collected from two universities in different countries and years. We find that the resulting best-estimate parameter distributions for both data sets are surprisingly similar, indicating that at least some features of communication dynamics generalize beyond specific contexts. We also find that variability of individual behavior over time is significantly less than variability across the population, suggesting that individuals can be classified into persistent “types”. We conclude that communication patterns may prove useful as an additional class of attribute data, complementing demographic and network data, for user classification and outlier detection—a point that we illustrate with an interpretable clustering of users based on their inferred model parameters.

