Results 1  10
of
25
Denser than the Densest Subgraph: Extracting Optimal QuasiCliques with Quality Guarantees
"... Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such function ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
(Show Context)
Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the wellknown densestsubgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: thegraphsfoundbyourmethodarecompact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a localsearch heuristic. Both algorithms are very efficient and scale well to large graphs. Weevaluateouralgorithmsonrealandsyntheticdatasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open. Categories andSubjectDescriptors
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of nonneighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of realworld and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the defacto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
Automatic Extraction of Facts, Relations, and Entities for WebScale Knowledge Base Population
, 2012
"... I hereby solemnly declare that this work was created on my own, using only the resources and tools mentioned. Information taken from other sources or indirectly adopted data and concepts are explicitly acknowledged with references to the respective sources. This work has not been submitted in a pro ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
I hereby solemnly declare that this work was created on my own, using only the resources and tools mentioned. Information taken from other sources or indirectly adopted data and concepts are explicitly acknowledged with references to the respective sources. This work has not been submitted in a process for obtaining an academic degree elsewhere in the same or in similar form. Eidesstattliche Versicherung Hiermit versichere ich an Eides statt, dass ich die vorliegende Arbeit selbstständig und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus anderen Quellen oder indirekt übernommenen
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
"... Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is rarely to find the “true optimum”, but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of kcores and ktruss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other stateoftheart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour. ∗Work done while the author was interning at Sandia National Laboratories, Livermore, CA.
Clustering evolving networks
 CoRR
"... Abstract. Roughly speaking, clustering evolving networks aims at detecting structurally dense subgroups in networks that evolve over time. This implies that the subgroups we seek for also evolve, which results in many additional tasks compared to clustering static networks. We discuss these addition ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Roughly speaking, clustering evolving networks aims at detecting structurally dense subgroups in networks that evolve over time. This implies that the subgroups we seek for also evolve, which results in many additional tasks compared to clustering static networks. We discuss these additional tasks and difficulties resulting thereof and present an overview on current approaches to solve these problems. We focus on clustering approaches in online scenarios, i.e., approaches that incrementally use structural information from previous time steps in order to incorporate temporal smoothness or to achieve low running time. Moreover, we describe a collection of real world networks and generators for synthetic data that are often used for evaluation. 1
A fresh look on knowledge bases: Distilling named events from news
 In CIKM’14
, 2014
"... ABSTRACT Knowledge bases capture millions of entities such as people, companies or movies. However, their knowledge of named events like sports finals, political scandals, or natural disasters is fairly limited, as these are continuously emerging entities. This paper presents a method for extractin ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Knowledge bases capture millions of entities such as people, companies or movies. However, their knowledge of named events like sports finals, political scandals, or natural disasters is fairly limited, as these are continuously emerging entities. This paper presents a method for extracting named events from news articles, reconciling them into canonicalized representation, and organizing them into finegrained semantic classes to populate a knowledge base. Our method captures similarity measures among news articles in a multiview attributed graph, considering textual contents, entity occurrences, and temporal ordering. For distilling canonicalized events from this raw data, we present a novel graph coarsening algorithm based on the informationtheoretic principle of minimum description length. The quality of our method is experimentally demonstrated by extracting, organizing, and evaluating 25 000 events from a corpus of 300 000 heterogeneous news articles.
The Kclique Densest Subgraph Problem
"... Numerous graph mining applications rely on detecting subgraphs which are large nearcliques. Since formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximable due to connections with the Maximum Clique problem, the polytime solvable densest subgraph pr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Numerous graph mining applications rely on detecting subgraphs which are large nearcliques. Since formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximable due to connections with the Maximum Clique problem, the polytime solvable densest subgraph problem which maximizes the average degree over all possible subgraphs “lies at the core of large scale data mining”[10]. However, frequently the densest subgraph problem fails in detecting large nearcliques in networks. In this work, we introduce the kclique densest subgraph problem, k ≥ 2. This generalizes the well studied densest subgraph problem which is obtained as a special case for k = 2. For k = 3 we obtain a novel formulation which we refer to as the triangle densest subgraph problem: given a graph G(V,E), find a subset of vertices S ∗ such that τ(S∗) = max S⊆V t(S)
Dynamic Community Detection in Weighted Graph Streams ∗
"... In this paper, we aim to tackle the problem of discovering dynamic communities in weighted graph streams, especially when the underlying social behavior of individuals varies considerably over different graph regions. To tackle this problem, a novel structure termed Local WeightedEdgebased Pattern ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we aim to tackle the problem of discovering dynamic communities in weighted graph streams, especially when the underlying social behavior of individuals varies considerably over different graph regions. To tackle this problem, a novel structure termed Local WeightedEdgebased Pattern (LWEP) Summary is proposed to describe a local homogeneous region. To efficiently compute LWEPs, some statistics need to be maintained according to the principle of preserving maximum weighted neighbor information with limited memory storage. To this end, the proposed approach is divided into online and offline components. During the online phase, we introduce some statistics, termed topk neighbor lists and topk candidate lists, to track. The key is to maintain only the topk neighbors with the largest link weights for each node. To allow for less active neighbors to transition into topk neighbors, an auxiliary data structure termed topk candidate list is used to identify emerging active neighbors. The statistics can be efficiently maintained in the online component. In the offline component, these statistics are used at each snapshot to efficiently compute LWEPs. Clustering is then performed to consolidate LWEPs into high level clusters. Finally, mapping is made between clusters of consecutive snapshots to generate temporally smooth communities. Experimental results are presented to illustrate the effectiveness and efficiency of the proposed approach. 1
Efficient Densest Subgraph Computation in Evolving Graphs
, 2015
"... Densest subgraph computation has emerged as an important primitive in a wide range of data analysis tasks such as community and event detection. Social media such as Facebook and Twitter are highly dynamic with new friendship links and tweets being generated incessantly, calling for efficient algori ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Densest subgraph computation has emerged as an important primitive in a wide range of data analysis tasks such as community and event detection. Social media such as Facebook and Twitter are highly dynamic with new friendship links and tweets being generated incessantly, calling for efficient algorithms that can handle very large and highly dynamic input data. While either scalable or dynamic algorithms for finding densest subgraphs have been proposed, a viable and satisfactory solution for addressing both the dynamic aspect of the input data and its large size is still missing. We study the densest subgraph problem in the the dynamic graph model, for which we present the first scalable algorithm with provable guarantees. In our model, edges are added adversarially while they are removed uniformly at
Finding Subgraphs with Maximum Total Density and Limited Overlap
"... Finding dense subgraphs in large graphs is a key primitive in a variety of realworld application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which migh ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Finding dense subgraphs in large graphs is a key primitive in a variety of realworld application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. While a large amount of work is devoted to finding a single densest subgraph, perhaps surprisingly, the problem of finding several dense subgraphs with limited overlap has not been studied in a principled way, to the best of our knowledge. In this work we define and study a natural generalization of the densest subgraph problem, where the main goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. After showing that such a problem is NPHard, we devise an efficient algorithm that comes with provable guarantees in some cases of interest, as well as, an efficient practical heuristic. Our extensive evaluation on large realworld graphs confirms the efficiency and effectiveness of our algorithms. 1.