Results 1  10
of
27
Discovering Popular Routes from Trajectories
 In ICDE
, 2011
"... Abstract—The booming industry of locationbased services has accumulated a huge collection of users ’ location trajectories of driving, cycling, hiking, etc. In this work, we investigate the problem of discovering the Most Popular Route (MPR) between two locations by observing the traveling behavior ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The booming industry of locationbased services has accumulated a huge collection of users ’ location trajectories of driving, cycling, hiking, etc. In this work, we investigate the problem of discovering the Most Popular Route (MPR) between two locations by observing the traveling behaviors of many previous users. This new query is beneficial to travelers who are asking directions or planning a trip in an unfamiliar city/area, as historical traveling experiences can reveal how people usually choose routes between locations. To achieve this goal, we firstly develop a Coherence Expanding algorithm to retrieve a transfer network from raw trajectories, for indicating all the possible movements between locations. After that, the Absorbing Markov Chain model is applied to derive a reasonable transfer probability foreachtransfernodeinthe network, which is subsequently used as the popularity indicator in the search phase. Finally, we propose a Maximum Probability Product algorithm to discover the MPR from a transfer network based on the popularity indicators in a breadthfirst manner, and we illustrate the results and performance of the algorithm by extensive experiments. I.
Distanceconstraint reachability computation in uncertain graphs
 PVLDB
"... Driven by the emerging network applications, querying and mining uncertain graphs has become increasingly important. In this paper, we investigate a fundamental problem concerning uncertain graphs, which we call the distanceconstraint reachability (DCR) problem: Given two vertices s and t, what is ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
Driven by the emerging network applications, querying and mining uncertain graphs has become increasingly important. In this paper, we investigate a fundamental problem concerning uncertain graphs, which we call the distanceconstraint reachability (DCR) problem: Given two vertices s and t, what is the probability that the distance from s to t is less than or equal to a userdefined threshold d in the uncertain graph? Since this problem is #PComplete, we focus on efficiently and accurately approximating DCR online. Our main results include two new estimators for the probabilistic reachability. One is a HorvitzThomson type estimator based on the unequal probabilistic sampling scheme, and the other is a novel recursive sampling estimator, which effectively combines a deterministic recursive computational procedure with a sampling process to boost the estimation accuracy. Both estimators can produce much smaller variance than the direct sampling estimator, which considers each trial to be either 1 or 0. We also present methods to make these estimators more computationally efficient. The comprehensive experiment evaluation on both real and synthetic datasets demonstrates the efficiency and accuracy of our new estimators. 1.
Efficient Subgraph Search over Large Uncertain Graphs
"... Retrieving graphs containing a query graph from a large graph database is a key task in many graphbased applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Retrieving graphs containing a query graph from a large graph database is a key task in many graphbased applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, and inaccurate because of the way the data is produced. In this paper,we study subgraph queries over uncertain graphs. Specifically, we consider the problem of answering thresholdbased probabilistic queries over a large uncertain graph database with the possible world semantics. We prove that problem is #Pcomplete, therefore, we adopt a filteringandverification strategy to speed up the search. In the filtering phase, we use a probabilistic inverted index, PIndex, based on subgraph features obtained by an optimal feature selection process. During the verification phase, we develop exact and bound algorithms to validate the remaining candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms. 1.
1 Clustering Large Probabilistic Graphs
"... Abstract—We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic proteinprotein interaction networks and discovering groups of users in affilia ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic proteinprotein interaction networks and discovering groups of users in affiliation networks. We extend the editdistance based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameterfree. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real proteinprotein interaction network and groundtruth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
"... Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these wor ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges’ occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #Pcomplete, thus, we employ a filterandverify framework to speed up the search. In the filtering phase, we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments. 1.
Efficient Discovery of Frequent Subgraph Patterns in Uncertain Graph Databases
"... Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three realworld datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the stateoftheart approach.
MWGen: A Mini World Generator
 In MDM, To Appear
, 2012
"... GMOD (Generic Moving Objects Database) is a database system that manages moving objects traveling through different environments and with multiple transportation modes, like Walk → Car → Indoor, as humans ’ movement can cover several different environments (e.g., road network, indoor) instead of a s ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
GMOD (Generic Moving Objects Database) is a database system that manages moving objects traveling through different environments and with multiple transportation modes, like Walk → Car → Indoor, as humans ’ movement can cover several different environments (e.g., road network, indoor) instead of a single environment. To evaluate the performance of GMOD, a comprehensive and scalable dataset consisting of all available environments (e.g., roads, bus network, buildings) and moving objects with multiple modes is essentially needed, where the location of a moving object is represented by referencing to the underlying environment. Due to the difficulty of gaining real datasets, in this paper we present a tool that creates the overall space, which is composed of the following environments: road network, bus network, metro network, pavement areas and indoor. Each environment is also called an infrastructure. All outdoor infrastructures are produced from a real road dataset and the indoor environment consisting of a set of buildings is generated from public floor plans. Within each infrastructure, we design a graph as well as the algorithm for trip plannings, like indoor navigation, routing in bus network. The time complexity of the algorithm is also analyzed. A complete navigation system through all environments is developed, which is used to guide data generation for moving objects covering all available environments. The generated data, including all infrastructures and moving objects, is managed by GMOD. We report the experimental results of the data generator by conducting experiments on two real road datasets and a set of public floor plans. 1
Reliable Clustering on Uncertain Graphs
"... Abstract—Many graphs in practical applications are not deterministic, but are probabilistic in nature because the existence of the edges is inferred with the use of a variety of statistical approaches. In this paper, we will examine the problem of clustering uncertain graphs. Uncertain graphs are be ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Many graphs in practical applications are not deterministic, but are probabilistic in nature because the existence of the edges is inferred with the use of a variety of statistical approaches. In this paper, we will examine the problem of clustering uncertain graphs. Uncertain graphs are best clustered with the use of a possible worlds model in which the most reliable clusters are discovered in the presence of uncertainty. Reliable clusters are those which are not likely to be disconnected in the context of different instantiations of the uncertain graph. In this paper we provide a generalized reliability measurement from two basic intuitions (purity and size balance) to overcome the challenges from standard reliability criterion, and develop a novel kmeans algorithm to solve the uncertain clustering problem. We present experimental results which illustrate the effectiveness and efficiency of our model and approachs. Keywordsuncertain graph; clustering; reliability; I.
ProbTree: A QueryEfficient Representation of Probabilistic Graphs Technical Paper
"... Information in many applications, such as mobile wireless systems, social networks, and road networks, is captured by graphs, in many cases uncertain. We study the problem of querying a probabilistic graph; in particular, we examine “sourcetotarget ” queries, such as computing the shortest path be ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Information in many applications, such as mobile wireless systems, social networks, and road networks, is captured by graphs, in many cases uncertain. We study the problem of querying a probabilistic graph; in particular, we examine “sourcetotarget ” queries, such as computing the shortest path between two vertices. Evaluating STqueries over probabilistic graphs is #Phard, as it requires examining an exponential number of “possible worlds”. Existing solutions to the STquery problem, which sample possible worlds, have two downsides: (i) many samples are needed for reasonable accuracy, and (ii) a possible world can be very large. To tackle these issues, we study the ProbTree, a data structure that stores a succinct representation of the probabilistic graph. Existing STquery solutions are executed on top of this structure, with the number of samples and possible world sizes reduced. 1.
Node classification in uncertain graphs
 In SSDBM
, 2014
"... In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the class ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a firstclass citizen. We propose two techniques based on a Bayes model, and show the benefits of incorporating uncertainty in the classification process as a firstclass citizen. The experimental results demonstrate the effectiveness of our approaches. 1.