Results 1  10
of
13
Overlapping Community Detection Using Seed Set Expansion
"... Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social netw ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. One of the most successful techniques for finding overlapping communities is based on local optimization and expansion of a community metric around a seed set of vertices. In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach. In particular, we develop new seeding strategies for a personalized PageRank scheme that optimizes the conductance community score. The key idea of our algorithm is to find good seeds, and then expand these seed sets using the personalized PageRank clustering procedure. Experimental results show that this seed set expansion approach outperforms other stateoftheart overlapping community detection methods. We also show that our new seeding strategies are better than previous strategies, and are thus effective in finding good overlapping clusters in a graph.
Flowbased algorithms for local graph clustering
, 2013
"... Given a subset A of vertices of an undirected graph G, the cutimprovement problem asks us to find a subset S that is similar to A but has smaller conductance. An elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of singlecommodity max ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Given a subset A of vertices of an undirected graph G, the cutimprovement problem asks us to find a subset S that is similar to A but has smaller conductance. An elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of singlecommodity maximum flow computations over the whole graph G. In this paper, we introduce LocalImprove, the first cutimprovement algorithm that is local, i.e., that runs in time dependent on the size of the input set A rather than on the size of the entire graph. Moreover, LocalImprove achieves this local behavior while closely matching the same theoretical guarantee as the global algorithm of Andersen and Lang. The main application of LocalImprove is to the design of better localgraphpartitioning algorithms. All previously known local algorithms for graph partitioning are randomwalk based and can only guarantee an output conductance of Õ( φopt) when the target set has conductance φopt ∈ [0, 1]. Very recently, Zhu, Lattanzi and Mirrokni [ZLM13] improved this to O(φopt/ Conn) where the internal connectivity parameter Conn ∈ [0, 1] is defined as the reciprocal of the mixing time of the random walk over the graph induced by the target set. This regime is of high practical interest in learning applications as it corresponds to the case when the target set is a wellconnected groundtruth cluster. In this work, we show how to use LocalImprove to obtain a constant approximation O(φopt) as long as Conn/φopt = Ω(1). This yields the first flowbased algorithm for local graph partitioning. Moreover, its performance strictly outperforms the ones based on random walks and surprisingly matches that of the best known global algorithm, which is SDPbased, in this parameter regime [MMV12]. Finally, our results show that spectral methods are not the only viable approach to the construction of local graph partitioning algorithm and open door to the study of algorithms with even better approximation and locality guarantees.
Approximate computation and implicit regularization for very largescale data analysis
 In Proceedings of the 31st ACM Symposium on Principles of Database Systems
, 2012
"... Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmicstatistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worstcase algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very largescale databases and that also have good inferential or predictive properties.
Constrained fractional set programs and their application in local clustering and community detection
 In ICML
, 2013
"... local clustering and community detection ..."
Semisupervised Eigenvectors for Locallybiased Learning
"... In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that prespecified target region. Locallybiased problems of this ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby” that prespecified target region. Locallybiased problems of this sort are particularly challenging for popular eigenvectorbased machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities. In this paper, we address this issue by providing a methodology to construct semisupervised eigenvectors of a graph Laplacian, and we illustrate how these locallybiased eigenvectors can be used to perform locallybiased machine learning. These semisupervised eigenvectors capture successivelyorthogonalized directions of maximum variance, conditioned on being wellcorrelated with an input seed set of nodes that is assumed to be provided in a semisupervised manner. We also provide several empirical examples demonstrating how these semisupervised eigenvectors can be used to perform locallybiased learning. 1
Semisupervised eigenvectors for largescale locallybiased learning
 Journal of Machine Learning Research
"... In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be inter ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set ” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth ” set of pixels. Locallybiased problems of this sort are particularly challenging for popular eigenvectorbased machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvectorbased methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semisupervised eigenvectors of a graph Laplacian, and we illustrate how these locallybiased eigenvectors can be used to perform locallybiased machine learning. These semisupervised eigenvectors capture successivelyorthogonalized directions of maximum variance, condi
Bayesian discovery of threat networks
 CoRR
"... Abstract—A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the NeymanPearson sense. T ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the NeymanPearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to wellknown spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatiotemporal relationships between vertices, and is used for a specific spacetime formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixedmembership blockmodel introduced to simulate random covert networks with realistic properties. Index Terms—Network detection, optimal detection, maximum likelihood detection, community detection, network theory (graphs), graph theory, diffusion on graphs, random walks on graphs, dynamic network models, Bayesian methods, harmonic analysis, eigenvector centrality, Laplace equations. I.
Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel KMeans Twan van Laarhoven
, 2016
"... Abstract Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We investigate the relation of conductance with weighted kernel kmeans for a single community, which leads to the introduction of a new objective function, σconductance. Conductance is obtained by setting σ to 0. Two algorithms, EMc and PGDc, are proposed to locally optimize σconductance and automatically tune the parameter σ. They are based on expectation maximization and projected gradient descent, respectively. We prove locality and give performance guarantees for EMc and PGDc for a class of dense and well separated communities centered around the seeds. Experiments are conducted on networks with groundtruth communities, comparing to stateoftheart graph diffusion algorithms for conductance optimization. On large graphs, results indicate that EMc and PGDc stay localized and produce communities most similar to the ground, while graph diffusion algorithms generate large communities of lower quality.
Antidifferentiating approximation algorithms: A case study with mincuts, spectral, and flow
"... We formalize and illustrate the general concept of algorithmic antidifferentiation: given an algorithmic procedure, e.g., an approximation algorithm for which worstcase approximation guarantees are available or a heuristic that has been engineered to be practicallyuseful but for which a precis ..."
Abstract
 Add to MetaCart
We formalize and illustrate the general concept of algorithmic antidifferentiation: given an algorithmic procedure, e.g., an approximation algorithm for which worstcase approximation guarantees are available or a heuristic that has been engineered to be practicallyuseful but for which a precise theoretical understanding is lacking, an algorithmic antiderivative is a precise statement of an optimization problem that is exactly solved by that procedure. We explore this concept with a case study of approximation algorithms for finding locallybiased partitions in data graphs, demonstrating connections between mincut objectives, a personalized version of the popular PageRank vector, and the highly effective “push ” procedure for computing an approximation to personalized PageRank. We show, for example, that this latter algorithm solves (exactly, but implicitly) an `1regularized `2regression problem, a fact that helps to explain its excellent performance in practice. We expect that, when available, these implicit optimization problems will be critical for rationalizing and predicting the performance of many approximation algorithms on realistic data. 1.
Semisupervised Eigenvectors for Largescale Locallybiased Learning∗
"... In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be inter ..."
Abstract
 Add to MetaCart
In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth ” set of pixels. Locallybiased problems of this sort are particularly challenging for popular eigenvectorbased machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvectorbased methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semisupervised eigenvectors of a graph Laplacian, and we illustrate how these locallybiased eigenvectors can be used to perform locallybiased machine learning. These semisupervised eigenvectors capture successivelyorthogonalized directions of maximum variance, conditioned on being wellcorrelated with an input seed set of nodes that is assumed to be provided in a semisupervised manner. We show that these semisupervised eigenvectors can be computed quickly as the solution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semisupervised eigenvectors can be used to perform locallybiased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian. 1