Results 1  10
of
19
Spaceefficient sampling from social activity streams
 In BigMine
, 2012
"... In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods hav ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memoryresident graphs and assume that the sampling algorithm can access the entire graph in order to decide which nodes/edges to select. Many largescale network datasets, however, are too large and/or dynamic to be processed using main memory (e.g., email, tweets, wall posts). In this work, we formulate the problem of sampling from large graph streams. We propose a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting. We evaluate the efficacy of our proposed methods empirically using several realworld data sets. Across all datasets, we found that our method produce samples that preserve better the original graph distributions. 1.
Graph Sample and Hold: A Framework for BigGraph Analytics
"... Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in c ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for biggraph analytics,
Online myopic network covering
 CoRR
"... Efficient marketing or awarenessraising campaigns seek to recruit n influential individuals – where n is the campaign budget – that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the so ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Efficient marketing or awarenessraising campaigns seek to recruit n influential individuals – where n is the campaign budget – that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the social network topology is known. Even in such a case the optimal solution is NPhard. In practice, however, the network topology is generally unknown and needs to be discovered onthefly. In this work we consider an unknown topology where recruited individuals disclose their social connections (a feature known as onehop lookahead). The goal of this work is to provide an efficient greedy online algorithm that recruits individuals as to maximize the size of target audience covered by the campaign. We propose a new greedy online algorithm, Maximum Expected dExcess Degree (MEED), and provide, to the best of our knowledge, the first detailed theoretical analysis of the cover size of a variety of well known network sampling algorithms on finite networks. Our proposed algorithm greedily maximizes the expected size of the cover. For a class of random power law networks we show that MEED simplifies into a straightforward procedure, which we denote MOD (Maximum Observed Degree). We substantiate our analytical results with extensive simulations and show that MOD significantly outperforms all analyzed myopic algorithms. We note that performance may be further improved if the node degree distribution is known or can be estimated online during the campaign. 1.
Scalable Vaccine Distribution in Large Graphs given Uncertain Data
"... Given an noisy or sampled snapshot of a network, like a contactnetwork or the blogosphere, in which an infection (or meme/virus) has been spreading for some time, what are the best nodes to immunize (vaccinate)? Manipulating graphs via node removal by itself is an important problem in multiple diff ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Given an noisy or sampled snapshot of a network, like a contactnetwork or the blogosphere, in which an infection (or meme/virus) has been spreading for some time, what are the best nodes to immunize (vaccinate)? Manipulating graphs via node removal by itself is an important problem in multiple different domains like epidemiology, public health and social media. Moreover, it is important to account for uncertainty as typically surveillance data on who is infected is limited or the data is sampled. Efficient algorithms for such a problem can help publichealth experts take more informed decisions. In this paper, we study the problem of designing vaccinedistribution algorithms under an uncertain environment, with known information consisting of confirmed cases as well as a probability distribution of unknown cases. We formulate the NPHard Uncertain DataAware Vaccination problem, and design multiple efficient algorithms for factorizable distributions (including a novel subquadratic algorithm) which naturally take into account the uncertainty, while providing robust solutions. Finally, we show the effectiveness and scalability of our methods via extensive experiments on real datasets, including large epidemiological and social networks.
Impact of Sampling Design in Estimation of Graph Characteristics
"... Abstract—Studying structural and functional characteristics of large scale graphs (or networks) has been a challenging task due to the related computational overhead. Hence, most studies consult to sampling to gather necessary information to estimate various features of these big networks. On the ot ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Studying structural and functional characteristics of large scale graphs (or networks) has been a challenging task due to the related computational overhead. Hence, most studies consult to sampling to gather necessary information to estimate various features of these big networks. On the other hand, using a best effort approach to graph sampling within the constraints of an application domain may not always produce accurate estimates. In fact, the mismatch between the characteristics of interest and the utilized network sampling methodology may result in incorrect inferences about the studied characteristics of the underlying system. In this study we empirically investigate the sources of information loss in a sampling process; identify the fundamental factors that need to be carefully considered in a sampling design; and use several synthetic and real world graphs to elaborately demonstrate the mismatch between the sampling design and graph characteristics of interest. I.
Semantically Sampling in Heterogeneous Social Networks
"... Online social networks sampling identifies a representative subnetwork that preserves certain graph property given heterogeneous semantics, with the full network not observed during sampling. This study presents a property, Relational Profile, to account for conditional dependency of node and relati ..."
Abstract
 Add to MetaCart
(Show Context)
Online social networks sampling identifies a representative subnetwork that preserves certain graph property given heterogeneous semantics, with the full network not observed during sampling. This study presents a property, Relational Profile, to account for conditional dependency of node and relation type semantics in a network, and a sampling method to preserve the property. We show the proposed sampling method better preserves Relational Profile. Next, Relational Profile can design features to boost network prediction. Finally, our sampled network trains more accurate prediction models than other sampling baselines.
Information Retrieval manuscript No. (will be inserted by the editor) Improving Daily Deals Recommendation Using ExploreThenExploit Strategies
"... Abstract DailyDeals Sites (DDSs) enable local businesses, such as restaurants and stores, to promote their products and services and to increase their sales by offering customers significantly reduced prices. If a customer finds a relevant deal in the catalog of electronic coupons, she can purchase ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract DailyDeals Sites (DDSs) enable local businesses, such as restaurants and stores, to promote their products and services and to increase their sales by offering customers significantly reduced prices. If a customer finds a relevant deal in the catalog of electronic coupons, she can purchase it and the DDS receives a commission. Thus, offering relevant deals to customers maximizes the profitability of the DDS. An immediate strategy, therefore, would be to apply existing recommendation algorithms to suggest deals that are potentially relevant to specific customers, enabling more appealing, effective and personalized catalogs. However, this strategy may be innocuous because (i) most of the customers are sporadic bargain hunters, and thus past preference data is extremely sparse, (ii) deals have a short living period, and thus data is extremely volatile, and (iii) customers ’ taste and interest may undergo temporal drifts. In order to address such a particularly challenging scenario, we propose a new algorithm for daily deals recommendation based on an explorethenexploit strategy. Basically, we choose a fraction of the customers to gather feedback on the current catalog in the exploration phase, and the remaining customers to receive improved recommendations based on the previously gathered feedback in a posterior exploitation phase. During exploration, a copurchase network structure is updated with customer feedback
Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics
"... Given a noisy or sampled snapshot of an infection in a large graph, can we automatically and reliably recover the truly infected yet somehow missed nodes? And, what about the seeds, the nodes from which the infection started to spread? These are important questions in diverse contexts, ranging from ..."
Abstract
 Add to MetaCart
(Show Context)
Given a noisy or sampled snapshot of an infection in a large graph, can we automatically and reliably recover the truly infected yet somehow missed nodes? And, what about the seeds, the nodes from which the infection started to spread? These are important questions in diverse contexts, ranging from epidemiology to social media. In this paper, we address the problem of simultaneously recovering the missing infections and the source nodes of the epidemic given noisy data. We formulate the problem by the Minimum Description Length principle, and propose NetFill, an efficient algorithm that automatically and highly accurately identifies the number and identities of both missing nodes and the infection seed nodes. Experimental evaluation on synthetic and real datasets, including using data from information cascades over 96 million blog posts and news articles, shows that our method outperforms other baselines, scales nearlinearly, and is highly effective in recovering missing nodes and sources. 1
International Journal of Modern Physics C © World Scientific Publishing Company Social network sampling using spanning trees
, 2015
"... Due to the large scales and limitations in accessing most online social networks, it is hard or infeasible to directly access them in a reasonable amount of time for studying and analysis. Hence, network sampling has emerged as a suitable technique to study and analyze real networks. The main goal o ..."
Abstract
 Add to MetaCart
Due to the large scales and limitations in accessing most online social networks, it is hard or infeasible to directly access them in a reasonable amount of time for studying and analysis. Hence, network sampling has emerged as a suitable technique to study and analyze real networks. The main goal of sampling online social networks is constructing a small scale sampled network which preserves the most important properties of the original network. In this paper, we propose two sampling algorithms for sampling online social networks using spanning trees. The first proposed sampling algorithm finds several spanning trees from randomly chosen starting nodes; then the edges in these spanning trees are ranked according to the number of times that each edge has appeared in the set of found spanning trees in the given network. The sampled network is then constructed as a subgraph of the original network which contains a fraction of nodes that are incident on highly ranked edges. In order to avoid traversing the entire network, the second sampling algorithm is proposed using partial spanning trees. The second sampling algorithm is similar to the first algorithm except that it uses partial spanning trees. Several experiments are conducted to examine the performance of the proposed sampling algorithms on wellknown real networks. The obtained results in comparison with other popular sampling methods demonstrate the efficiency of