Results 1 
7 of
7
Multiconstrained graph pattern matching in largescale contextual social graphs
 in ICDE’15, 2015
"... Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. Given a pattern graph GQ and a data graph GD, a GPM algorithm finds those subgraphs, GM, that match GQ in GD. However, the existing GPM methods do not consider the multiple constraints on edges in GQ, which are commonly exist in various applications such as, crowdsourcing travel, social network based ecommerce and study group selection, etc. In this paper, we first conceptually extend Bounded Simulation to MultiConstrained Simulation (MCS), and propose a novel NPComplete MultiConstrained Graph Pattern Matching (MCGPM) problem. Then, to address the efficiency issue in largescale MCGPM, we propose a new concept called Strong Social Component (SSC), consisting of participants with strong social connections. We also propose an approach to identify SSCs, and propose a novel index method and a graph compression method for SSC. Moreover, we devise a heuristic algorithm to identify MCGPM results effectively and efficiently without decompressing graphs. An extensive empirical study on five realworld largescale social graphs has demonstrated the effectiveness, efficiency and scalability of our approach. I.
Querying big data: Bridging theory and practice
 J. Comput. Sci. Technol
"... Abstract Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable" on big data? How can we make big data "small" so that it is feasible to find exact query answers? When exact answers are beyond reach in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable" on big data? How can we make big data "small" so that it is feasible to find exact query answers? When exact answers are beyond reach in practice, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers? To get sensible query answers in big data, what else do we necessarily do in addition to coping with the size of the data? This position paper aims to provide an overview of recent advances in the study of querying big data. We propose approaches to tackling these challenging issues, and identify open problems for future research.
Optimal Enumeration: Efficient Topk Tree Matching
"... Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computa ..."
Abstract
 Add to MetaCart
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of topk tree pattern matching; that is, given a rooted tree T, compute its topk matches in a directed graph G based on the twigpattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler’s procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT) and nT ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR + k(nT + log k)) in contrast to the time complexity O(mR log k+knT (log k+dT)) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of topk graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude. 1.
Scalable Graph Exploration and Visualization: Sensemaking Challenges and Opportunities
"... Making sense of large graph datasets is a fundamental and challenging process that advances science, education and technology. We survey research on graph exploration and visualization approaches aimed at addressing this challenge. Different from existing surveys, our investigation highlights appro ..."
Abstract
 Add to MetaCart
Making sense of large graph datasets is a fundamental and challenging process that advances science, education and technology. We survey research on graph exploration and visualization approaches aimed at addressing this challenge. Different from existing surveys, our investigation highlights approaches that have strong potential in handling large graphs, algorithmically, visually, or interactively; we also explicitly connect relevant works from multiple research fields – data mining, machine learning, humancomputer ineraction, information visualization, information retrieval, and recommender systems – to underline their parallel and complementary contributions to graph sensemaking. We ground our discussion in sensemaking research; we propose a new graph sensemaking hierarchy that categorizes tools and techniques based on how they operate on the graph data (e.g., local vs global). We summarize and compare their strengths and weaknesses, and highlight open challenges. We conclude with future research directions for graph sensemaking.
Querying WebScale Information Networks Through Bounding Matching Scores
"... Webscale information networks containing billions of entities are common nowadays. Querying these networks can be modeled as a subgraph matching problem. Since information networks are incomplete and noisy in nature, it is important to discover answers that match exactly as well as answers that a ..."
Abstract
 Add to MetaCart
(Show Context)
Webscale information networks containing billions of entities are common nowadays. Querying these networks can be modeled as a subgraph matching problem. Since information networks are incomplete and noisy in nature, it is important to discover answers that match exactly as well as answers that are similar to queries. Existing graph matching algorithms usually use graph indices to improve the efficiency of query processing. For webscale information networks, it may not be feasible to build the graph indices due to the amount of work and the memory/storage required. In this paper, we propose an efficient algorithm for finding the best k answers for a given query without precomputing graph indices. The quality of an answer is measured by a matching score that is computed online. To speed up query processing, we propose a novel technique for bounding the matching scores during the computation. By using bounds, we can efficiently prune the answers that have low qualities without having to evaluate all possible answers. The bounding technique can be implemented in a distributed environment, allowing our approach to efficiently answer the queries on webscale information networks. We demonstrate the effectiveness and the efficiency of our approach through a series of experiments on realworld information networks. The result shows that our bounding technique can reduce the running time up to two orders of magnitude comparing to an approach that does not use bounds.
MAGE: Matching Approximate Patterns in RichlyAttributed Graphs
"... Abstract—Given a large graph with millions of nodes and edges, say a social network where both its nodes and edges have multiple attributes (e.g., job titles, tie strengths), how to quickly find subgraphs of interest (e.g., a ring of businessmen with strong ties)? We present MAGE, a scalable, multic ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Given a large graph with millions of nodes and edges, say a social network where both its nodes and edges have multiple attributes (e.g., job titles, tie strengths), how to quickly find subgraphs of interest (e.g., a ring of businessmen with strong ties)? We present MAGE, a scalable, multicore subgraph matching approach that supports expressive queries over large, richlyattributed graphs. Our major contributions include: (1) MAGE supports graphs with both node and edge attributes (most existing approaches handle either one, but not both); (2) it supports expressive queries, allowing multiple attributes on an edge, wildcards as attribute values (i.e., match any permissible values), and attributes with continuous values; and (3) it is scalable, supporting graphs with several hundred million edges. We demonstrate MAGE’s effectiveness and scalability via extensive experiments on large real and synthetic graphs, such as a Google+ social network with 460 million edges. I.
Optimal Enumeration: Efficient Topk Tree Matching
"... Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computa ..."
Abstract
 Add to MetaCart
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of topk tree pattern matching; that is, given a rooted tree T, compute its topk matches in a directed graph G based on the twigpattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler’s procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT) and nT ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR + k(nT + log k)) in contrast to the time complexity O(mR log k+knT (log k+dT)) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of topk graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude. 1.