Results 1 - 10
of
13
Overlapping Community Detection Using Seed Set Expansion
"... Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social netw ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. One of the most successful techniques for finding overlapping communities is based on local optimization and expansion of a community metric around a seed set of vertices. In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach. In particular, we develop new seeding strategies for a personalized PageRank scheme that optimizes the conductance community score. The key idea of our algorithm is to find good seeds, and then expand these seed sets using the personalized PageRank clustering procedure. Experimental results show that this seed set expansion approach outperforms other state-of-the-art overlapping community detection methods. We also show that our new seeding strategies are better than previous strategies, and are thus effective in finding good overlapping clusters in a graph.
Flow-based algorithms for local graph clustering
, 2013
"... Given a subset A of vertices of an undirected graph G, the cut-improvement problem asks us to find a subset S that is similar to A but has smaller conductance. An elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of single-commodity max ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Given a subset A of vertices of an undirected graph G, the cut-improvement problem asks us to find a subset S that is similar to A but has smaller conductance. An elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of single-commodity maximum flow computations over the whole graph G. In this paper, we introduce LocalImprove, the first cut-improvement algorithm that is local, i.e., that runs in time dependent on the size of the input set A rather than on the size of the entire graph. Moreover, LocalImprove achieves this local behavior while closely matching the same theoretical guarantee as the global algorithm of Andersen and Lang. The main application of LocalImprove is to the design of better local-graph-partitioning algorithms. All previously known local algorithms for graph partitioning are random-walk based and can only guarantee an output conductance of Õ( φopt) when the target set has conductance φopt ∈ [0, 1]. Very recently, Zhu, Lattanzi and Mirrokni [ZLM13] improved this to O(φopt/ Conn) where the internal connectivity parameter Conn ∈ [0, 1] is defined as the reciprocal of the mixing time of the random walk over the graph induced by the target set. This regime is of high practical interest in learning applications as it corresponds to the case when the target set is a well-connected ground-truth cluster. In this work, we show how to use LocalImprove to obtain a constant approximation O(φopt) as long as Conn/φopt = Ω(1). This yields the first flow-based algorithm for local graph partitioning. Moreover, its performance strictly outperforms the ones based on random walks and surprisingly matches that of the best known global algorithm, which is SDP-based, in this parameter regime [MMV12]. Finally, our results show that spectral methods are not the only viable approach to the construction of local graph partitioning algorithm and open door to the study of algorithms with even better approximation and locality guarantees.
Approximate computation and implicit regularization for very large-scale data analysis
- In Proceedings of the 31st ACM Symposium on Principles of Database Systems
, 2012
"... Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.
Constrained fractional set programs and their application in local clustering and community detection
- In ICML
, 2013
"... local clustering and community detection ..."
Semi-supervised Eigenvectors for Locally-biased Learning
"... In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that pre-specified target region. Locally-biased problems of this ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that pre-specified target region. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We also provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning. 1
Semi-supervised eigenvectors for large-scale locally-biased learning
- Journal of Machine Learning Research
"... In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be inter ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set ” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth ” set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, condi-
Bayesian discovery of threat networks
- CoRR
"... Abstract—A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. T ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance im-provements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties. Index Terms—Network detection, optimal detection, maxi-mum likelihood detection, community detection, network theory (graphs), graph theory, diffusion on graphs, random walks on graphs, dynamic network models, Bayesian methods, harmonic analysis, eigenvector centrality, Laplace equations. I.
Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means Twan van Laarhoven
, 2016
"... Abstract Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, σ-conductance. Conductance is obtained by setting σ to 0. Two algorithms, EMc and PGDc, are proposed to locally optimize σ-conductance and automatically tune the parameter σ. They are based on expectation maximization and projected gradient descent, respectively. We prove locality and give performance guarantees for EMc and PGDc for a class of dense and well separated communities centered around the seeds. Experiments are conducted on networks with ground-truth communities, comparing to state-of-the-art graph diffusion algorithms for conductance optimization. On large graphs, results indicate that EMc and PGDc stay localized and produce communities most similar to the ground, while graph diffusion algorithms generate large communities of lower quality.
Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow
"... We formalize and illustrate the general concept of algorithmic anti-differentiation: given an algorith-mic procedure, e.g., an approximation algorithm for which worst-case approximation guarantees are available or a heuristic that has been engi-neered to be practically-useful but for which a pre-cis ..."
Abstract
- Add to MetaCart
We formalize and illustrate the general concept of algorithmic anti-differentiation: given an algorith-mic procedure, e.g., an approximation algorithm for which worst-case approximation guarantees are available or a heuristic that has been engi-neered to be practically-useful but for which a pre-cise theoretical understanding is lacking, an algo-rithmic anti-derivative is a precise statement of an optimization problem that is exactly solved by that procedure. We explore this concept with a case study of approximation algorithms for finding locally-biased partitions in data graphs, demon-strating connections between min-cut objectives, a personalized version of the popular PageRank vector, and the highly effective “push ” procedure for computing an approximation to personalized PageRank. We show, for example, that this lat-ter algorithm solves (exactly, but implicitly) an `1-regularized `2-regression problem, a fact that helps to explain its excellent performance in prac-tice. We expect that, when available, these im-plicit optimization problems will be critical for rationalizing and predicting the performance of many approximation algorithms on realistic data. 1.
Semi-supervised Eigenvectors for Large-scale Locally-biased Learning∗
"... In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be inter ..."
Abstract
- Add to MetaCart
In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth ” set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors cap-ture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We show that these semi-supervised eigenvectors can be computed quickly as the so-lution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian. 1