Results 1  10
of
15
Diversified topk graph pattern matching
 PVLDB
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are ex ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large setM(Q;G). (2) We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on r(), with the early termination property, i.e., they find topk matches without computing the entireM(Q;G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both r() and d(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
User Effort Minimization Through Adaptive Diversification
"... Ambiguous queries, which are typical on search engines and recommendation systems, often return a large number of results from multiple interpretations. Given that many users often perform their searches on limited size screens (e.g. mobile phones), an important problem is which results to display f ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Ambiguous queries, which are typical on search engines and recommendation systems, often return a large number of results from multiple interpretations. Given that many users often perform their searches on limited size screens (e.g. mobile phones), an important problem is which results to display first. Recent work has suggested displaying a set of results (Topk) based on their relevance score with respect to the query and their diversity with respect to each other. However, previous works balance relevance and diversity mostly by a predefined fixed way. In this paper, we show that for different search tasks there is a different ideal balance of relevance and diversity. We propose a principled method for adaptive diversification of query results that minimizes the user effort to find the desired results, by dynamically balancing the relevance and diversity at each query step (e.g. when refining the query or viewing the next page of results). We introduce a navigation cost model as a means to estimate the effort required to navigate the queryresults, and show that the problem of estimating the ideal amount of diversification at each step is NPHard. We propose an efficient approximate algorithm to select a nearoptimal subset of the query results that minimizes the expected user effort. Finally we demonstrate the efficacy and efficiency of our solution in minimizing user effort, compared to stateoftheart ranking methods, by means of an extensive experimental evaluation and a comprehensive user study on Amazon
MultiQuery Diversification in Microblogging Posts
"... Effectively exploring data generated by microblogging services is challenging due to its high volume and production rate. To address this issue, we propose a solution that helps users effectively consume information from a microblogging stream, by filtering out redundant data. We formalize our appr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Effectively exploring data generated by microblogging services is challenging due to its high volume and production rate. To address this issue, we propose a solution that helps users effectively consume information from a microblogging stream, by filtering out redundant data. We formalize our approach as a novel optimization problem termed MultiQuery Diversification Problem (MQDP). In MQDP, the input consists of a list of microblogging posts and a set of user queries (e.g. news topics), where each query matches a subset of posts. The objective is to compute the smallest subset of posts that cover all other posts with respect to a “diversity dimension ” that may represent time or, say, sentiment. Roughly, the solution (cover) has the property that each covered post has nearby posts in the cover that are collectively related to all queries relevant to this covered post. This is distinct from previous singlequery diversity problems, as we may have two nearby posts that are related to intersecting but not nested sets of queries, in which case none covers the other. Another key difference is that we do not define diversity in terms of post similarity, since posts are too short for this approach to be meaningful; instead, we focus on finding representative posts for ordered diversity dimensions like time and sentiment, which are critical in microblogging. For example, for time as the diversity dimension, the selected posts will show how certain news events unfolded over time. We prove that MQDP is NPhard and we propose an exact dynamic programming algorithm that is feasible for small problem instances. We also propose two approximate algorithms with provable approximation bounds, and show how they can be adapted for a streaming setting. Through comprehensive experiments on real data, we show that our algorithms efficiently and effectively generate diverse and representative posts. 1.
POIKILO: A Tool for Evaluating the Results of Diversification Models and Algorithms.
"... Searchresultdiversificationhasattractedconsiderableattention as a means of improving the quality of results retrieved byuserqueries. Inthisdemonstration,wepresentPoikilo,a tool to assist users in locating and evaluating diverse results. We provide implementations of a wide suite of models and algori ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Searchresultdiversificationhasattractedconsiderableattention as a means of improving the quality of results retrieved byuserqueries. Inthisdemonstration,wepresentPoikilo,a tool to assist users in locating and evaluating diverse results. We provide implementations of a wide suite of models and algorithms to compute and compare diverse results. Users can tune various diversification parameters, combine diversity with relevance and also see how diverse results change over time in the case of streaming data. 1.
unknown title
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expe ..."
Abstract
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, inpracticemanysocialqueriesaretofindmatches of a specific pattern node, rather than the entire M(Q,G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q,G) that match uo, instead of thelarge set M(Q,G). (2) Westudy twoclasses of functions for ranking the matches: relevance functions δr() based on, e.g., social impact, and distance functions δd() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on δr(), with the early termination property, i.e., they find topk matches without computing the entire M(Q,G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both δr() and δd(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
TopK Structural Diversity Search in Large Networks
"... Social contagion depicts a process of information (e.g., fads, opinions, news) diffusion in the online social networks. A recent study reports that in a social contagion process the probability of contagion is tightly controlled by the number of connected components in an individual’s neighborhood. ..."
Abstract
 Add to MetaCart
(Show Context)
Social contagion depicts a process of information (e.g., fads, opinions, news) diffusion in the online social networks. A recent study reports that in a social contagion process the probability of contagion is tightly controlled by the number of connected components in an individual’s neighborhood. Such a number is termed structural diversity of an individual and it is shown to be a key predictor in the social contagion process. Based on this, a fundamental issue in a social network is to find topk users with the highest structural diversities. In this paper, we, for the first time, study the topk structural diversity search problem in a large network. Specifically, we develop an effective upper bound of structural diversity for pruning the search space. The upper bound can be incrementally refined in the search process. Based on such upper bound, we propose an efficient framework for topk structural diversity search. To further speed up the structural diversity evaluation in the search process, several carefully devised heuristic search strategies are proposed. Extensive experimental studies are conducted in 13 realworld large networks, and the results demonstrate the efficiency and effectiveness of the proposed methods. 1.
AsSoonAsPossible Topk Query Processing in P2P Systems
, 2013
"... Abstract. Topk query processing techniques provide two main advantages for unstructured peertopeer (P2P) systems. First they avoid overwhelming users with too many results. Second they reduce significantly networkresources consumption. However,existingapproaches suffer from long waiting times. Th ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Topk query processing techniques provide two main advantages for unstructured peertopeer (P2P) systems. First they avoid overwhelming users with too many results. Second they reduce significantly networkresources consumption. However,existingapproaches suffer from long waiting times. This is because topk results are returned only when all queried peers have finished processing the query. As a result, query response time is dominated by the slowest queried peer. In this paper, we address this users ’ waiting time problem. For this, we revisit topk query processing in P2P systems by introducing two novel notions in addition to response time: the stabilization time and the cumulative quality gap. Using these notions, we formally define the assoonaspossible (ASAP)topk processing problem. Then, we propose a family of algorithms called ASAP to deal with this problem. We validate our solution through implementation and extensive experimentation. The results show that ASAP significantly outperforms baseline algorithms by returning final topk result to users in much better times. 1
TopK Structural Diversity Search in Large Networks
"... Social contagion depicts a process of information (e.g., fads, opinions, news) diffusion in the online social networks. A recent study reports that in a social contagion process the probability of contagion is tightly controlled by the number of connected components in an individual’s neighborhood ..."
Abstract
 Add to MetaCart
(Show Context)
Social contagion depicts a process of information (e.g., fads, opinions, news) diffusion in the online social networks. A recent study reports that in a social contagion process the probability of contagion is tightly controlled by the number of connected components in an individual’s neighborhood. Such a number is termed structural diversity of an individual and it is shown to be a key predictor in the social contagion process. Based on this, a fundamental issue in a social network is to find topk users with the highest structural diversities. In this paper, we, for the first time, study the topk structural diversity search problem in a large network. Specifically, we develop an effective upper bound of structural diversity for pruning the search space. The upper bound can be incrementally refined in the search process. Based on such upper bound, we propose an efficient framework for topk structural diversity search. To further speed up the structural diversity evaluation in the search process, several carefully devised heuristic search strategies are proposed. Extensive experimental studies are conducted in 13 realworld large networks, and the results demonstrate the efficiency and effectiveness of the proposed methods. 1.
Spatial Cohesion Queries
"... Given a set of attractors and repellers, the cohesion query returns the point in database that is as close to the attractors and as far from the repellers as possible. Cohesion queries find applications in various settings, such as facility location problems, locationbased services. For example, w ..."
Abstract
 Add to MetaCart
(Show Context)
Given a set of attractors and repellers, the cohesion query returns the point in database that is as close to the attractors and as far from the repellers as possible. Cohesion queries find applications in various settings, such as facility location problems, locationbased services. For example, when attractors represent favorable plases, e.g., tourist attractions, and repellers denote undesirable locations, e.g., competitor stores, the cohesion query would return the ideal location, among a database of possible options, to open a new store. These queries are not trivial to process as the best location, unlike aggregate nearest or farthest neighbor queries, may be far from the optimal point in space. Therefore, to achieve sublinear performance in practice, we employ novel bestfirst search and branch and bound paradigms that take advantage of the geometrical interpretation of the problem. Our methods are up to orders of magnitude faster than linear scan and adaptations of existing aggregate nearest/farthest neighbor algorithms.