Results 1 
9 of
9
Walking in Facebook: A Case Study of Unbiased Sampling of OSNs
 in Proc. IEEE INFOCOM
, 2010
"... Abstract—With more than 250 million active users [1], Facebook (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several c ..."
Abstract

Cited by 115 (14 self)
 Add to MetaCart
(Show Context)
Abstract—With more than 250 million active users [1], Facebook (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several candidate techniques. Two approaches that are found to perform well are the MetropolisHasting random walk (MHRW) and a reweighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the ”groundtruth ” (UNI obtained through true uniform sampling of FB userIDs). In contrast, the traditional BreadthFirstSearch (BFS) and Random Walk (RW) perform quite poorly, producing substantially biased results. In addition to offline performance assessment, we introduce onlineformal convergence diagnostics to assess sample quality during the data collection process. We show how these can be used to effectively determine when a random walk sample is of adequate size and quality for subsequent use (i.e., when it is safe to cease sampling). Using these methods, we collect the first, to the best of our knowledge, unbiased sample of Facebook. Finally, we use one of our representative datasets, collected through MHRW, to characterize several key properties of Facebook. IndexTerms—Measurements,onlinesocial networks,Facebook, graph sampling, crawling, bias. I.
Practical recommendations on crawling online social networks
 SELECTED AREAS IN COMMUNICATIONS, IEEE JOURNAL ON
, 2011
"... Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
(Show Context)
Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the MetropolisHasting random walk (MHRW) and a reweighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the “ground truth. ” In contrast, using BreadthFirstSearch (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook.
Studies in Solution Sampling
"... We introduce novel algorithms for generating random solutions from a uniform distribution over the solutions of a boolean satisfiability problem. Our algorithms operate in two phases. In the first phase, we use a recently introduced SampleSearch scheme to generate biased samples while in the second ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We introduce novel algorithms for generating random solutions from a uniform distribution over the solutions of a boolean satisfiability problem. Our algorithms operate in two phases. In the first phase, we use a recently introduced SampleSearch scheme to generate biased samples while in the second phase we correct the bias by using either Sampling/Importance Resampling or the MetropolisHastings method. Unlike stateoftheart algorithms, our algorithms guarantee convergence in the limit. Our empirical results demonstrate the superior performance of our new algorithms over several competing schemes.
A Simple Application of Sampling Importance Resampling (SIR) for Solution Sampling
"... Abstract. We introduce a new technique of SampleSearchSIR to generate random solutions of a Boolean satisfiability problem from a uniform distribution over the solutions. Our technique operates in two phases. In the first phase, it uses a recently proposed SampleSearch scheme to generate approximat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new technique of SampleSearchSIR to generate random solutions of a Boolean satisfiability problem from a uniform distribution over the solutions. Our technique operates in two phases. In the first phase, it uses a recently proposed SampleSearch scheme to generate approximately random solutions from the satisfiability problem and then in the second phase it uses the Sampling Importance Resampling (SIR) principle to reduce the approximation error introduced by SampleSearch. The use of SIR guarantees convergence (in the limit) that none of the current stateoftheart schemes have. Our empirical results demonstrate the superior performance and better convergence of SampleSearchSIR as compared to stateoftheart schemes. 1
A Fully Bayesian Approach to Assessment of Model Adequacy
 in Inverse Problems. Statistical Methodology . Submitted
, 2012
"... We consider the problem of assessing goodness of fit of a single Bayesian model to the observed data in the inverse problem context. A novel procedure of goodness of fit test is proposed, based on construction of reference distributions using the ‘inverse ’ part of the given model. This is motivated ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider the problem of assessing goodness of fit of a single Bayesian model to the observed data in the inverse problem context. A novel procedure of goodness of fit test is proposed, based on construction of reference distributions using the ‘inverse ’ part of the given model. This is motivated by an example from palaeoclimatology in which it is of interest to reconstruct past climates using information obtained from fossils deposited in lake sediment. Since climate influences species, the model is built in the forward sense, that is, fossils are assumed to depend upon climate. The model combines ‘modern data ’ which consists of observed species composition and the corresponding observed climates with ‘fossil data’; the latter data consisting of fossil species composition deposited in lake sediments for the past thousands of years, but the corresponding past climates are unknown. Interest focuses on prediction of unknown past climates, which is the inverse part of the model. Technically, given a model f(Y  X, θ), where Y is the observed data and X is a set of (nonrandom) covariates, we obtain reference distributions based on the posterior pi(X ̃  Y), where X ̃ must be interpreted as the unobserved random vector corresponding to the observed covariates X. Put simply, if the posterior distribution pi(X ̃  Y) gives high density to the observed covariates X, or equivalently, if the posterior distribution of T (X̃) gives high density to T (X), where T is any appropriate statistic,
Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence (2008) Studies in Solution Sampling
"... We introduce novel algorithms for generating random solutions from a uniform distribution over the solutions of a boolean satisfiability problem. Our algorithms operate in two phases. In the first phase, we use a recently introduced SampleSearch scheme to generate biased samples while in the second ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce novel algorithms for generating random solutions from a uniform distribution over the solutions of a boolean satisfiability problem. Our algorithms operate in two phases. In the first phase, we use a recently introduced SampleSearch scheme to generate biased samples while in the second phase we correct the bias by using either Sampling/Importance Resampling or the MetropolisHastings method. Unlike stateoftheart algorithms, our algorithms guarantee convergence in the limit. Our empirical results demonstrate the superior performance of our new algorithms over several competing schemes.
Predicting Protein Structure with Guided Conformation Space Search
, 2005
"... Protein structure prediction is one of the great challenges in structural biology. The ability to accurately predict the threedimensional structure of proteins would bring about significant scientific advances and would facilitate finding cures and treatments for many diseases. We propose a novel c ..."
Abstract
 Add to MetaCart
(Show Context)
Protein structure prediction is one of the great challenges in structural biology. The ability to accurately predict the threedimensional structure of proteins would bring about significant scientific advances and would facilitate finding cures and treatments for many diseases. We propose a novel computational framework for protein structure prediction. The novelty of the framework lies in its approach to conformation space search. Conformation space search is considered to be the primary bottleneck towards consistent, highresolution prediction. The proposed approach to conformation space search represents a major conceptual shift in protein structure prediction, made possible by combining insights and algorithms from robotics and machine learning with techniques from molecular biology in an innovative manner. The key innovation comes from the insight that targetspecific information can effectively guide conformation space search towards biologically relevant regions. We propose a framework for protein structure prediction that achieves biological accuracy and computational efficiency by guiding conformation space search using targetspecific information. The proposed framework exploits information about the characteristics of the target’s energy landscape acquired continuously during search. As search progresses, the continuous integration of these sources of information will tailor conformation space search to the particular characteristics of the target. This tailored conformation space exploration can overcome the current bottleneck, yielding highly accurate and efficient structure prediction. 1
1Practical Recommendations on Crawling Online Social Networks
, 2016
"... Abstract—Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the MetropolisHasting random walk (MHRW) and a reweighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the “ground truth. ” In contrast, using BreadthFirstSearch (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook. Index Terms—Sampling methods, Social network services,