Results 1 - 10
of
11
Fast and accurate estimation of shortest paths in large graphs
- In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
Orion: Shortest Path Estimation for Large Social Graphs
"... Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitiv ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitive in graph problems such as computing node separation, centrality computation, mutual friend detection, and community detection. For large million-node social graphs, computing even a single shortest path using traditional breadth-first-search can take several seconds. In this paper, we propose a novel node distance estimation mechanism that effectively maps nodes in high dimensional graphs to positions in low-dimension Euclidean coordinate spaces, thus allowing constant time node distance computation. We describe Orion, a prototype graph coordinate system, and explore critical decisions in its design. Finally, we evaluate the accuracy of Orion’s node distance estimates, and show that it can produce accurate results in applications such as node separation, node centrality, and ranked social search. 1
A Continuous Query System for Dynamic Route Planning
"... Abstract—In this paper, we address the problem of answering continuous route planning queries over a road network, in the presence of updates to the delay (cost) estimates of links. A simple approach to this problem would be to recompute the best path for all queries on arrival of every delay update ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—In this paper, we address the problem of answering continuous route planning queries over a road network, in the presence of updates to the delay (cost) estimates of links. A simple approach to this problem would be to recompute the best path for all queries on arrival of every delay update. However, such a naive approach scales poorly when there are many users who have requested routes in the system. Instead, we propose two new classes of approximate techniques – K-paths and proximity measures to substantially speed up processing of the set of designated routes specified by continuous route planning queries in the face of incoming traffic delay updates. Our techniques work through a combination of precomputation of likely good paths and by avoiding complete recalculations on every delay update, instead only sending the user new routes when delays change significantly. Based on an experimental evaluation with 7,000 drives from real taxi cabs, we found that the routes delivered by our techniques are within 5 % of the best shortest path and have run times an order of magnitude or less compared to a naive approach. I.
On k-skip Shortest Paths
"... Given two vertices s, t in a graph, let P be the shortest path (SP) from s to t, and P ⋆ a subset of the vertices in P. P ⋆ is a k-skip shortest path from s to t, if it includes at least a vertex out of every k consecutive vertices in P. In general, P ⋆ succinctly describes P by sampling the vertice ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Given two vertices s, t in a graph, let P be the shortest path (SP) from s to t, and P ⋆ a subset of the vertices in P. P ⋆ is a k-skip shortest path from s to t, if it includes at least a vertex out of every k consecutive vertices in P. In general, P ⋆ succinctly describes P by sampling the vertices in P with a rate of at least 1/k. This makes P ⋆ a natural substitute in scenarios where reporting every single vertex of P is unnecessary or even undesired. This paper studies k-skip SP computation in the context of spatial network databases (SNDB). Our technique has two properties crucial for real-time query processing in SNDB. First, our solution is able to answer k-skip queries significantly faster than finding the original SPs in their entirety. Second, the previous objective is achieved with a structure that occupies less space than storing the underlying road network. The proposed algorithms are the outcome of a careful theoretical analysis that reveals valuable insight into the characteristics of the k-skip SP problem. Their efficiency has been confirmed by extensive experiments with real data.
University Student Use of the Wikipedia
"... Abstract: The 2008 proxy log covering all student access to the Wikipedia from the University of Otago is analysed. The log covers 17,635 student users for all 366 days in the year, amounting to over 577,973 user sessions. The analysis shows the Wikipedia is used every hour of the day, but seasonall ..."
Abstract
- Add to MetaCart
Abstract: The 2008 proxy log covering all student access to the Wikipedia from the University of Otago is analysed. The log covers 17,635 student users for all 366 days in the year, amounting to over 577,973 user sessions. The analysis shows the Wikipedia is used every hour of the day, but seasonally. Use is low between semesters, rising steadily throughout the semester until it peaks at around exam time. The analysis of the articles that are retrieved as well as an analysis of which links are clicked shows that the Wikipedia is used for study-related purposes. Medical documents are popular reflecting the specialty of the university. The mean Wikipedia session length is about a minute and a half and consists of about three clicks. The click graph the users generated is compared to the link graph in the Wikipedia. In about 14 % of the user sessions the user has chosen a sub-optimal path from the start of their session to the final document they view. In 33 % the path is better than optimal suggesting that users prefer to search than to follow the link-graph. When they do click, they click links in the running text (93.6%) and rarely on “See Also ” links (6.4%), but this bias disappears when the frequency of these types of links ’ occurrence is corrected for. Several recommendations for changes to the link discovery methodology are made. These changes include using highly viewed articles from the log as test data and using user clicks as user judgements. Keywords: Information Retrieval, Link Discovery. 1.
Online Computation of Fastest Path in Time-Dependent Spatial Networks ⋆
"... Abstract. The problem of point-to-point fastest path computation in static spatial networks is extensively studied with many precomputation techniques proposed to speed-up the computation. Most of the existing approaches make the simplifying assumption that travel-times of the network edges are cons ..."
Abstract
- Add to MetaCart
Abstract. The problem of point-to-point fastest path computation in static spatial networks is extensively studied with many precomputation techniques proposed to speed-up the computation. Most of the existing approaches make the simplifying assumption that travel-times of the network edges are constant. However, with real-world spatial networks the edge travel-times are time-dependent, where the arrival-time to an edge determines the actual travel-time on the edge. In this paper, we study the online computation of fastest path in time-dependent spatial networks and present a technique which speeds-up the path computation. We show that our fastest path computation based on a bidirectional time-dependent A * search significantly improves the computation time and storage complexity. With extensive experiments using real data-sets (including a variety of large spatial networks with real traffic data) we demonstrate the efficacy of our proposed techniques for online fastest path computation. 1
The Filter-Placement Problem and its Application to Minimizing Information Multiplicity
"... In many information networks, data items – such as updates in social networks, news flowing through interconnected RSS feeds and blogs, measurements in sensor networks, route updates in ad-hoc networks – propagate in an uncoordinated manner: nodes often relay information they receive to neighbors, i ..."
Abstract
- Add to MetaCart
In many information networks, data items – such as updates in social networks, news flowing through interconnected RSS feeds and blogs, measurements in sensor networks, route updates in ad-hoc networks – propagate in an uncoordinated manner: nodes often relay information they receive to neighbors, independent of whether or not these neighbors received the same information from other sources. This uncoordinated data dissemination may result in significant, yet unnecessary communication and processing overheads, ultimately reducing the utility of information networks. To alleviate the negative impacts of this information multiplicity phenomenon, we propose that a subset of nodes (selected at key positions in the network) carry out additional information filtering functionality. Thus, nodes are responsible for the removal (or significant reduction) of the redundant data items relayed through them. We refer to such nodes as filters. We formally define the Filter Placement problem as a combinatorial optimization problem, and study its computational complexity for different types of graphs. We also present polynomial-time approximation algorithms and scalable heuristics for the problem. Our experimental results, which we obtained through extensive simulations on synthetic and real-world information flow networks, suggest that in many settings a relatively small number of filters is fairly effective in removing a large fraction of redundant information. 1.
Atlas: Approximating Shortest Paths in Social Graphs
"... Abstract. The search for shortest paths is an essential primitive for a variety of graph-based applications, particularly those on online social networks. For example, LinkedIn users perform queries to find the shortest path “social links” connecting them to a particular user to facilitate introduct ..."
Abstract
- Add to MetaCart
Abstract. The search for shortest paths is an essential primitive for a variety of graph-based applications, particularly those on online social networks. For example, LinkedIn users perform queries to find the shortest path “social links” connecting them to a particular user to facilitate introductions. This type of graph query is challenging for moderately sized graphs, but becomes computationally intractable for graphs underlying today’s social networks, most of which contain millions of nodes and billions of edges. We propose Atlas, a novel approach to scalably approximate shortest paths between graph nodes using a collection of spanning trees. Spanning trees are easy to generate, compact relative to original graphs, and can be distributed across machines to parallelize queries. We demonstrate its scalability and effectiveness using 6 large social graphs from Facebook, Orkut and Renren, the largest of which includes 43 million nodes and 1 billion edges. We describe techniques to incrementally update Atlas as social graphs change over time. We capture graph dynamics using 35 daily snapshots of a Facebook network, and show that Atlas can amortize the cost of tree updates over time. Finally, we apply Atlas to several graph applications, and show that they produce results that closely approximate ideal results. 1
Efficient Shortest Paths on Massive Social Graphs
"... Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social ..."
Abstract
- Add to MetaCart
Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS). In this paper, we study ways to enable scalable graph processing for today’s massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near-) shortest paths between node pairs. After a onetime preprocessing cost, Rigel answers node-distance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy. I.
Distance Oracles for Stretch Less Than 2 Abstract
"... We present distance oracles for weighted undirected graphs that return distances of stretch less than 2. For the realistic case of sparse graphs, our distance oracles exhibit a smooth three-way trade-off between space, stretch and query time — a phenomenon that does not occur in dense graphs. In par ..."
Abstract
- Add to MetaCart
We present distance oracles for weighted undirected graphs that return distances of stretch less than 2. For the realistic case of sparse graphs, our distance oracles exhibit a smooth three-way trade-off between space, stretch and query time — a phenomenon that does not occur in dense graphs. In particular, for any positive integer t and for any 1≤α ≤ n, our distance oracle is of size O(m+n 2 /α) and returns distances of stretch at most(1+ 2 t+1) in time O((αµ)t), whereµ=2m/n is the average degree of the graph. The query time can be further reduced to O((α+µ) t) at the expense of a small additive stretch. 1

