Results 1  10
of
153
Rankingbased clustering of heterogeneous information networks with star network schema
 In: Proc. 2009 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009
, 2009
"... A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on ..."
Abstract

Cited by 84 (30 self)
 Add to MetaCart
A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently. A recent study proposed a new algorithm, RankClus, for clustering on bityped heterogeneous networks. However, a realworld network may consist of more than two types, and the interactions among multityped objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multityped heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate highquality netclusters. An iterative enhancement method is developed that leads to effective rankingbased clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each netcluster.
Scalable Network Distance Browsing in Spatial Databases
, 2008
"... An algorithm is presented for finding the k nearest neighbors in a spatial network in a bestfirst manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact ..."
Abstract

Cited by 80 (8 self)
 Add to MetaCart
(Show Context)
An algorithm is presented for finding the k nearest neighbors in a spatial network in a bestfirst manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact that the shortest paths from vertex u to all of the remaining vertices can be decomposed into subsets based on the first edges on the shortest paths to them from u. Thus, in the worst case, the amount of work depends on the number of objects that are examined and the number of links on the shortest paths to them from q, rather than depending on the number of vertices in the network. The amount of storage required to keep track of the subsets is reduced by taking advantage of their spatial coherence which is captured by the aid of a shortest path quadtree. In particular, experiments on a number of large road networks as
An efficient and scalable approach to cnn queries in a road network
 In Proc. of VLDB
, 2005
"... A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in t ..."
Abstract

Cited by 60 (0 self)
 Add to MetaCart
(Show Context)
A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in the context of moving objects databases. The performance of existing approaches based on the network distance such as the shortest path length depends largely on the density of objects of interest. To overcome this problem, we propose UNICONS (a unique continuous search algorithm) for NN queries and CNN queries performed on a network. We incorporate the use of precomputed NN lists into Dijkstra’s algorithm for NN queries. A mathematical rationale is employed to produce the final results of CNN queries. Experimental results for reallife datasets of various sizes show that UNICONS outperforms its competitors by up to 3.5 times for NN queries and 5 times for CNN queries depending on the density of objects and the number of NNs required. 1
Aggregate Nearest Neighbor Queries in Spatial Databases
 TODS
, 2005
"... Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes t ..."
Abstract

Cited by 59 (6 self)
 Add to MetaCart
Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes the sum of distances pqi  for 1 ≤ i ≤ n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility. If Q fits in memory and P is indexed by an Rtree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results. Then, we analyze their performance and propose cost models for query optimization. Finally, we extend our techniques for diskresident queries and approximate ANN retrieval. The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets.
Pathsim: Meta pathbased topk similarity search in heterogeneous information networks
 In VLDB’ 11
, 2011
"... Similarity search is a primitive operation in database and Web search engines. With the advent of largescale heterogeneous information networks that consist of multityped, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity sear ..."
Abstract

Cited by 58 (23 self)
 Add to MetaCart
Similarity search is a primitive operation in database and Web search engines. With the advent of largescale heterogeneous information networks that consist of multityped, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept
Continuous Nearest Neighbor Monitoring in Road Networks
 PROCEEDINGS 32 ND VLDB CONFERENCE
, 2006
"... Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study kNN monitoring in road netw ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
(Show Context)
Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study kNN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as fluctuations of edge weights. The first one maintains the query results by processing only updates that may invalidate the current NN sets. The second method follows the shared execution paradigm to reduce the processing time. In particular, it groups together the queries that fall in the path between two consecutive intersections in the network, and produces their results by monitoring the NN sets of these intersections. We experimentally verify the applicability of the proposed techniques to continuous monitoring of large data and query sets.
On trip planning queries in spatial databases
 In SSTD
, 2005
"... In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
(Show Context)
In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through at least one point from each category, and ends at E. For example, a driver traveling from Boston to Providence might want to stop to a gas station, a bank and a post office on his way, and the goal is to provide him with the best possible route (in terms of distance, traffic, road conditions, etc.). The difficulty of this query lies in the existence of multiple choices per category. In this paper, we study fast approximation algorithms for TPQ in a metric space. We provide a number of approximation algorithms with approximation ratios that depend on either the number of categories, the maximum number of points
The V*Diagram: A QueryDependent Approach to Moving KNN Queries
, 2008
"... The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
(Show Context)
The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper presents an incremental saferegionbased technique for answering MkNN queries, called the V*Diagram. In general, a safe region is a set of points where the query point can move without changing the query answer. Traditional saferegion approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the current knowledge of the query point and the search space in addition to the data objects. As a result, the V*Diagram has much smaller IO and computation costs than existing methods. The experimental results show that the V*Diagram outperforms the best existing technique by two orders of magnitude.
Efficient query processing on spatial networks
 In Proceedings of the 13th ACM International Symposium on Advances in Geographic Information Systems
, 2005
"... A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding t ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding that is compact in representation and fast in path and distance retrievals. Using this framework, a wide variety of spatial queries such as incremental nearest neighbor searches and spatial distance joins can be shown to work on datasets of locations residing on a spatial network of sufficiently large size. The suggested framework is suitable for both main memory and diskresident datasets. Categories and Subject Descriptors
Distance indexing on road networks
 In PVLDB
, 2006
"... The processing of kNN and continuous kNN queries on spatial network databases (SNDB) has been intensively studied recently. However, there is a lack of systematic study on the computation of network distances, which is the most fundamental difference between a road network and a Euclidean space. S ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
The processing of kNN and continuous kNN queries on spatial network databases (SNDB) has been intensively studied recently. However, there is a lack of systematic study on the computation of network distances, which is the most fundamental difference between a road network and a Euclidean space. Since the online Dijkstra’s algorithm has been shown to be efficient only for short distances, we propose an efficient index, called distance signature, for distance computation and query processing over long distances. Distance signature discretizes the distances between objects and network nodes into categories and then encodes these categories. To minimize the storage and search costs, we present the optimal category partition, and the encoding and compression algorithms for the signatures, based on a simplified network topology. By mathematical analysis and experimental study, we showed that the signature index is efficient and robust for various data distributions, query workloads, parameter settings and network updates. 1.