Results 1 - 10
of
156
Ranking-based clustering of heterogeneous information networks with star network schema
- In: Proc. 2009 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009
, 2009
"... A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on ..."
Abstract
-
Cited by 85 (30 self)
- Add to MetaCart
A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently. A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.
Scalable Network Distance Browsing in Spatial Databases
, 2008
"... An algorithm is presented for finding the k nearest neighbors in a spatial network in a best-first manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact ..."
Abstract
-
Cited by 84 (10 self)
- Add to MetaCart
(Show Context)
An algorithm is presented for finding the k nearest neighbors in a spatial network in a best-first manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact that the shortest paths from vertex u to all of the remaining vertices can be decomposed into subsets based on the first edges on the shortest paths to them from u. Thus, in the worst case, the amount of work depends on the number of objects that are examined and the number of links on the shortest paths to them from q, rather than depending on the number of vertices in the network. The amount of storage required to keep track of the subsets is reduced by taking advantage of their spatial coherence which is captured by the aid of a shortest path quadtree. In particular, experiments on a number of large road networks as
Pathsim: Meta path-based top-k similarity search in heterogeneous information networks
- In VLDB’ 11
, 2011
"... Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity sear ..."
Abstract
-
Cited by 68 (27 self)
- Add to MetaCart
Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept
An efficient and scalable approach to cnn queries in a road network
- In Proc. of VLDB
, 2005
"... A continuous search in a road network re-trieves the objects which satisfy a query con-dition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in t ..."
Abstract
-
Cited by 60 (0 self)
- Add to MetaCart
(Show Context)
A continuous search in a road network re-trieves the objects which satisfy a query con-dition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in the context of moving objects databases. The performance of existing approaches based on the network distance such as the shortest path length de-pends largely on the density of objects of in-terest. To overcome this problem, we propose UNICONS (a unique continuous search algo-rithm) for NN queries and CNN queries per-formed on a network. We incorporate the use of precomputed NN lists into Dijkstra’s algo-rithm for NN queries. A mathematical ratio-nale is employed to produce the final results of CNN queries. Experimental results for real-life datasets of various sizes show that UNI-CONS outperforms its competitors by up to 3.5 times for NN queries and 5 times for CNN queries depending on the density of objects and the number of NNs required. 1
Aggregate Nearest Neighbor Queries in Spatial Databases
- TODS
, 2005
"... Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes t ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,... qn,anANN query outputs the facility p ∈ P that minimizes the sum of distances |pqi | for 1 ≤ i ≤ n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility. If Q fits in memory and P is indexed by an R-tree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results. Then, we analyze their performance and propose cost models for query optimization. Finally, we extend our techniques for disk-resident queries and approximate ANN retrieval. The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets.
Continuous Nearest Neighbor Monitoring in Road Networks
- PROCEEDINGS 32 ND VLDB CONFERENCE
, 2006
"... Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road netw ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
(Show Context)
Recent research has focused on continuous monitoring of nearest neighbors (NN) in highly dynamic scenarios, where the queries and the data objects move frequently and arbitrarily. All existing methods, however, assume the Euclidean distance metric. In this paper we study k-NN monitoring in road networks, where the distance between a query and a data object is determined by the length of the shortest path connecting them. We propose two methods that can handle arbitrary object and query moving patterns, as well as fluctuations of edge weights. The first one maintains the query results by processing only updates that may invalidate the current NN sets. The second method follows the shared execution paradigm to reduce the processing time. In particular, it groups together the queries that fall in the path between two consecutive intersections in the network, and produces their results by monitoring the NN sets of these intersections. We experimentally verify the applicability of the proposed techniques to continuous monitoring of large data and query sets.
On trip planning queries in spatial databases
- In SSTD
, 2005
"... In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
(Show Context)
In this paper we discuss a new type of query in Spatial Databases, called the Trip Planning Query (TPQ). Given a set of points of interest P in space, where each point belongs to a specific category, a starting point S and a destination E, TPQ retrieves the best trip that starts at S, passes through at least one point from each category, and ends at E. For example, a driver traveling from Boston to Providence might want to stop to a gas station, a bank and a post office on his way, and the goal is to provide him with the best possible route (in terms of distance, traffic, road conditions, etc.). The difficulty of this query lies in the existence of multiple choices per category. In this paper, we study fast approximation algorithms for TPQ in a metric space. We provide a number of approximation algorithms with approximation ratios that depend on either the number of categories, the maximum number of points
The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries
, 2008
"... The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
(Show Context)
The moving k nearest neighbor (MkNN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper presents an incremental safe-region-based technique for answering MkNN queries, called the V*-Diagram. In general, a safe region is a set of points where the query point can move without changing the query answer. Traditional safe-region approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the current knowledge of the query point and the search space in addition to the data objects. As a result, the V*-Diagram has much smaller IO and computation costs than existing methods. The experimental results show that the V*-Diagram outperforms the best existing technique by two orders of magnitude.
Efficient query processing on spatial networks
- In Proceedings of the 13th ACM International Symposium on Advances in Geographic Information Systems
, 2005
"... A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding t ..."
Abstract
-
Cited by 34 (15 self)
- Add to MetaCart
A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding that is compact in representation and fast in path and distance retrievals. Using this framework, a wide variety of spatial queries such as incremental nearest neighbor searches and spatial distance joins can be shown to work on datasets of locations residing on a spatial network of sufficiently large size. The suggested framework is suitable for both main memory and disk-resident datasets. Categories and Subject Descriptors
Distance indexing on road networks
- In PVLDB
, 2006
"... The processing of kNN and continuous kNN queries on spa-tial network databases (SNDB) has been intensively studied recently. However, there is a lack of systematic study on the computation of network distances, which is the most funda-mental difference between a road network and a Euclidean space. S ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
(Show Context)
The processing of kNN and continuous kNN queries on spa-tial network databases (SNDB) has been intensively studied recently. However, there is a lack of systematic study on the computation of network distances, which is the most funda-mental difference between a road network and a Euclidean space. Since the online Dijkstra’s algorithm has been shown to be efficient only for short distances, we propose an effi-cient index, called distance signature, for distance computa-tion and query processing over long distances. Distance sig-nature discretizes the distances between objects and network nodes into categories and then encodes these categories. To minimize the storage and search costs, we present the opti-mal category partition, and the encoding and compression algorithms for the signatures, based on a simplified net-work topology. By mathematical analysis and experimen-tal study, we showed that the signature index is efficient and robust for various data distributions, query workloads, parameter settings and network updates. 1.