Results 1  10
of
42
REX: Explaining Relationships between Entity Pairs
, 2011
"... Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft 1, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) rela ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft 1, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) relationships, and various studies have been conducted on how to leverage them to answer entity seeking queries. Meanwhile, in a complementary direction, analyses over the query logs have enabled researchers to identify entity pairs that are statistically correlated. Such entity relationships are then presented to search users through the “related searches ” feature in modern search engines. However, entity relationships thus discovered can often be “puzzling ” to the users because why the entities are connected is often indescribable. In this paper, we propose a novel problem called entity relationship explanation, which seeks to explain why a pair of entities are connected, and solve this challenging problem by integrating the above two complementary approaches, i.e., we leverage the knowledge base to “explain ” the connections discovered between entity pairs. More specifically, we present REX, a system that takes a pair of entities in a given knowledge base as input and efficiently identifies a ranked list of relationship explanations. We formally define relationship explanations and analyze their desirable properties. Furthermore, we design and implement algorithms to efficiently enumerate and rank all relationship explanations based on multiple measures of “interestingness.” We perform extensive experiments over real webscale data gathered from DBpedia and a commercial search engine, demonstrating the efficiency and scalability of REX. We also perform user studies to corroborate the effectiveness of explanations generated by REX.
Towards proximity pattern mining in large graphs
 In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. SIGMOD ’10. ACM
, 2010
"... Mining graph patterns in large networks is critical to a variety of applications such as malware detection and biological module discovery. However, frequent subgraphs are often ineffective to capture association existing in these applications, due to the complexity of isomorphism testing and the ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Mining graph patterns in large networks is critical to a variety of applications such as malware detection and biological module discovery. However, frequent subgraphs are often ineffective to capture association existing in these applications, due to the complexity of isomorphism testing and the inelastic pattern definition. In this paper, we introduce proximity pattern which is a significant departure from the traditional concept of frequent subgraphs. Defined as a set of labels that cooccur in neighborhoods, proximity pattern blurs the boundary between itemset and structure. It relaxes the rigid structure constraint of frequent subgraphs, while introducing connectivity to frequent itemsets. Therefore, it can benefit from both: efficient mining in itemsets and structure proximity from graphs. We developed two models to define proximity patterns. The second one, called Normalized Probabilistic Association (NmPA), is able to transform a complex graph mining problem to a simplified probabilistic itemset mining problem, which can be solved efficiently by a modified FPtree algorithm, called pFP. NmPA and pFP are evaluated on reallife social and intrusion networks. Empirical results show that it not only finds interesting patterns that are ignored by the existing approaches, but also achieves high performance for finding proximity patterns in largescale graphs.
Subgraph support in a single large graph
 IN: ICDM ’07: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS
, 2007
"... Defining the support (or frequency) of a subgraph is trivial when a database of graphs is given: it is simply the number of graphs in the database that contain the subgraph. However, if the input is one large graph, an appropriate support definition is much more difficult to find. In this paper we ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Defining the support (or frequency) of a subgraph is trivial when a database of graphs is given: it is simply the number of graphs in the database that contain the subgraph. However, if the input is one large graph, an appropriate support definition is much more difficult to find. In this paper we study the core problem, namely overlapping embeddings of the subgraph, in detail and suggest a definition that relies on the nonexistence of equivalent ancestor embeddings in order to guarantee that the resulting support is antimonotone. We prove this property and describe a method to compute the support defined in this way.
Graph data management and mining: a survey of algorithms and applications
 Wang (Eds.), Managing and Mining Graph Data, of Advances in Database Systems
, 2010
"... ..."
Mining the temporal dimension of the information propagation
 In IDA
, 2009
"... Abstract. In the last decade, Social Network Analysis has been a field in which the effort devoted from several researchers in the Data Mining area has increased very fast. Among the possible related topics, the study of the information propagation in a network attracted the interest of many resear ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
Abstract. In the last decade, Social Network Analysis has been a field in which the effort devoted from several researchers in the Data Mining area has increased very fast. Among the possible related topics, the study of the information propagation in a network attracted the interest of many researchers, also from the industrial world. However, only a few answers to the questions “How does the information propagates over a network, why and how fast? ” have been discovered so far. On the other hand, these answers are of large interest, since they help in the tasks of finding experts in a network, assessing viral marketing strategies, identifying fast or slow paths of the information inside a collaborative network. In this paper we study the problem of finding frequent patterns in a network with the help of two different techniques: TAS (Temporally Annotated Sequences) mining, aimed at extracting sequential patterns where each transition between two events is annotated with a typical transition time that emerges from input data, and Graph Mining, which is helpful for locally analyzing the nodes of the networks with their properties. Finally we show preliminary results done in the direction of mining the information propagation over a network, performed on two well known email datasets, that show the power of the combination of these two approaches. 1
GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph
"... Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or proteinprotein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GRAMI, a novel framework for frequent subgraph mining in a single large graph. GRAMI undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GRAMI that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGRAMI, a version supporting structural and semantic constraints, and AGRAMI, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches. 1.
Antimonotonic Overlapgraph Support Measures
 In Proceedings of the Eighth IEEE International Conference on Data Mining
, 2008
"... In graph mining, a frequency measure is antimonotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
In graph mining, a frequency measure is antimonotonic if the frequency of a pattern never exceeds the frequency of a subpattern. The efficiency and correctness of most graph pattern miners relies critically on this property. We study the case where the dataset is a single graph. Vanetik, Gudes and Shimony already gave sufficient and necessary conditions for antimonotonicity of measures depending only on the edgeoverlaps between the intances of the pattern in a labeled graph. We extend these results to homomorphisms, isomorphisms and homeomorphisms on both labeled and unlabeled, directed and undirected graphs, for vertex and edge overlap. We show a set of reductions between the different morphisms that preserve overlap. We also prove that the popular maximum independent set measure assigns the minimal possible meaningful frequency, introduce a new measure based on the minimum clique partition that assigns the maximum possible meaningful frequency and introduce a new measure sandwiched between the former two based on the polytime computable Lovász θfunction. 1
Mining Graph Patterns
 Managing and Mining Graph Data, volume 40 of Advances in Database Systems
, 2010
"... ..."
ConstraintBased Mining of Sets of Cliques Sharing Vertex Properties
 In ACNE’10
, 2010
"... Abstract. We consider data mining methods on large graphs where a set of labels is associated to each vertex. A typical example of such graphs is a social network of collaborating researchers where additional information represent the main publication targets (preferred conferences or journals) for ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider data mining methods on large graphs where a set of labels is associated to each vertex. A typical example of such graphs is a social network of collaborating researchers where additional information represent the main publication targets (preferred conferences or journals) for each author. We investigate the extraction of sets of dense subgraphs such that the vertices in all subgraphs of a set share a large enough set of labels. As a first step, we consider here the special case of dense subgraphs that are cliques. We proposed a method to compute all maximal homogeneous clique sets that satisfy userdefined constraints on the number of separated cliques, on the size of the cliques, and on the number of labels shared by all the vertices. The empirical validation illustrates the scalability of our approach and it provides experimental feedback on two real datasets, more precisely an annotated social network derived from the DBLP database and an enriched biological network of proteinprotein interactions. In both cases, we discuss the relevancy of extracted patterns thanks to available domain knowledge.