Results 1 
7 of
7
Diversified topk graph pattern matching
 PVLDB
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are ex ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large setM(Q;G). (2) We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on r(), with the early termination property, i.e., they find topk matches without computing the entireM(Q;G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both r() and d(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
Ontologybased subgraph querying
 In ICDE
, 2013
"... Abstract — Subgraph querying has been applied in a variety of emerging applications. Traditional subgraph querying based on subgraph isomorphism requires identical label matching, which is often too restrictive to capture the matches that are semantically close to the query graphs. This paper extend ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract — Subgraph querying has been applied in a variety of emerging applications. Traditional subgraph querying based on subgraph isomorphism requires identical label matching, which is often too restrictive to capture the matches that are semantically close to the query graphs. This paper extends subgraph querying to identify semantically related matches by leveraging ontology information. (1) We introduce the ontologybased subgraph querying, which revises subgraph isomorphism by mapping a query to semantically related subgraphs in terms of a given ontology graph. We introduce a metric to measure the similarity of the matches. Based on the metric, we introduce an optimization problem to find top K best matches. (2) We provide a filteringandverification framework to identify (topK) matches for ontologybased subgraph queries. The framework efficiently extracts a small subgraph of the data graph from an ontology index, and further computes the matches by only accessing the extracted subgraph. (3) In addition, we show that the ontology index can be efficiently updated upon the changes to the data graphs, enabling the framework to cope with dynamic data graphs. (4) We experimentally verify the effectiveness and efficiency of our framework using both synthetic and real life graphs, comparing with traditional subgraph querying methods. I.
DSI: A Method for Indexing Large Graphs Using Distance Set
"... Abstract. Recent years we have witnessed a great increase in modeling data as large graphs in multiple domains, such as XML, the semantic web, social network. In these circumstances, researchers are interested in querying the large graph like that: Given a large graph G, and a query Q, we report all ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Recent years we have witnessed a great increase in modeling data as large graphs in multiple domains, such as XML, the semantic web, social network. In these circumstances, researchers are interested in querying the large graph like that: Given a large graph G, and a query Q, we report all the matches of Q in G. Since subgraph isomorphism checking is proved to be NPComplete[1], it is infeasible to scan the whole large graph for answers, especially when the query’s size is also large. Hence, the ”filterverification ” approach is widely adopted. In this approach, researchers first index the neighborhood of each vertex in the large graph, then filter vertexes, and finally perform subgraph matching algorithms. Previous techniques mainly focus on efficient matching algorithms, paying little attention to indexing techniques. However, appropriate indexing techniques could help improve the efficiency of query response by generating less candidates. In this paper we investigate indexing techniques on large graphs, and propose an index structure DSI(Distance Set Index) to capture the neighborhood of each vertex. Through our distance set index, more vertexes could be pruned, resulting in a much smaller search space. Then a subgraph matching algorithm is performed in the search space. We have applied our index structure to real datasets and synthetic datasets. Extensive experiments demonstrate the efficiency and effectiveness of our indexing technique. Key words: Graph Indexing, Distance Set 1
unknown title
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expe ..."
Abstract
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, inpracticemanysocialqueriesaretofindmatches of a specific pattern node, rather than the entire M(Q,G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q,G) that match uo, instead of thelarge set M(Q,G). (2) Westudy twoclasses of functions for ranking the matches: relevance functions δr() based on, e.g., social impact, and distance functions δd() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on δr(), with the early termination property, i.e., they find topk matches without computing the entire M(Q,G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both δr() and δd(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
OUTLIER DETECTION FOR INFORMATION NETWORKS BY
"... The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and ma ..."
Abstract
 Add to MetaCart
(Show Context)
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in highdimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers. For community based outliers, we discuss the problem in both static as well as dynamic settings.
TopK Interesting Subgraph Discovery in Information Networks
"... Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering topK subgraphs of entities with rare and surprising associations. Answ ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering topK subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naı̈ve rankingaftermatching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the topK matches are desired. In this paper, we address the two challenges of matching and ranking in topK subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel topK mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs. I.