Results 1  10
of
16
Counting Triangles in Data Streams
, 2006
"... We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ra ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log V ·(1+s·V /E)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computa
Coresets in Dynamic Geometric Data Streams
, 2005
"... A dynamic geometric data stream consists of a sequence of m insert/delete operations of points from the discrete space {1,..., ∆} d [26]. We develop streaming (1 + ɛ)approximation algorithms for kmedian, kmeans, MaxCut, maximum weighted matching (MaxWM), maximum travelling salesperson (MaxTSP), m ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
A dynamic geometric data stream consists of a sequence of m insert/delete operations of points from the discrete space {1,..., ∆} d [26]. We develop streaming (1 + ɛ)approximation algorithms for kmedian, kmeans, MaxCut, maximum weighted matching (MaxWM), maximum travelling salesperson (MaxTSP), maximum spanning tree (MaxST), and average distance over dynamic geometric data streams. Our algorithms maintain a small weighted set of points (a coreset) that approximates with probability 2/3 the current point set with respect to the considered problem during the m insert/delete operations of the data stream. They use poly(ɛ −1, log m, log ∆) space and update time per insert/delete operation for constant k and dimension d. Having a coreset one only needs a fast approximation algorithm for the weighted problem to compute a solution quickly. In fact, even an exponential algorithm is sometimes feasible as its running time may still be polynomial in n. For example one can compute in poly(log n, exp(O((1+log(1/ɛ)/ɛ) d−1))) time a solution to kmedian and kmeans [21] where n is the size of the current point set and k and d are constants. Finding an implicit solution to MaxCut can be done in poly(log n, exp((1/ɛ) O(1))) time. For MaxST and average distance we require poly(log n, ɛ −1) time and for MaxWM we require O(n 3) time to do this.
StreamKM++: A Clustering Algorithm for Data Streams
, 2010
"... We develop a new kmeans clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the kmeans++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we us ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
We develop a new kmeans clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the kmeans++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a nonuniform sampling approach similar to the kmeans++ seeding procedure to obtain small coresets from the data stream. This construction is rather easy to implement and, unlike other coreset constructions, its running time has only a low dependency on the dimensionality of the data. Second, we propose a new data structure which we call a coreset tree. The use of these coreset trees signi cantly speeds up the time necessary for the nonuniform sampling during our coreset construction. We compare our algorithm experimentally with two wellknown streaming implementations (BIRCH [16] and StreamLS [4, 9]). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and significantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We conclude that, if the first priority is the quality of the clustering, then our algorithm provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large. We also give a theoretical justification of our approach by proving that our sample set is a small coreset in low dimensional spaces.
Estimating clustering indexes in data streams
"... We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ɛ)approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coeffic ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ɛ)approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.
Abstract Efficient kinetic data structures for MaxCut
"... We develop a randomized kinetic data structure that maintains a partition of the moving points into two sets such that the corresponding cut is with probability at least 1−ϱ a (1−ɛ)approximation of the Euclidean MaxCut. The data structure answers queries of the form “to which side of the partition ..."
Abstract
 Add to MetaCart
We develop a randomized kinetic data structure that maintains a partition of the moving points into two sets such that the corresponding cut is with probability at least 1−ϱ a (1−ɛ)approximation of the Euclidean MaxCut. The data structure answers queries of the form “to which side of the partition belongs query point p?” in O(21/ɛO(1) log 2 n/ɛ2(d+1) ) time. Under linear motion the data structure processes � O(n log(ϱ−1)/ɛd+3) events, each requiring O(log 2 n) expected time except for a constant number of events that require � O(n · ln(ϱ−1)/ɛd+3) time. A flight plan update can be performed in O(log 3 n · ln(ϱ−1)/ɛd+3) average expected time, where the average is taken over the worst case update times of the points at an arbitrary point of time. No efficient kinetic data structure for the MaxCut has been known before. 1
Online Occlusion Culling ⋆
"... Abstract. Modern computer graphics systems are able to render sophisticated 3D scenes consisting of millions of polygons. For most camera positions only a small collection of these polygons is visible. We address the problem of occlusion culling, i.e., determine hidden primitives. Aila, Miettinen, a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Modern computer graphics systems are able to render sophisticated 3D scenes consisting of millions of polygons. For most camera positions only a small collection of these polygons is visible. We address the problem of occlusion culling, i.e., determine hidden primitives. Aila, Miettinen, and Nordlund suggested to implement a FIFO buffer on graphics cards which is able to delay the polygons before drawing them [2]. When one of the polygons within the buffer is occluded or masked by another polygon arriving later from the application, the rendering engine can drop the occluded one without rendering, saving important rendering time. We introduce a theoretical online model to analyse these problems in theory using competitive analysis. For different cost measures we invent the first competitive algorithms for online occlusion culling. Our implementation shows that these algorithms outperform the FIFO strategy for real 3D scenes as well. 1
Approximating Parameters of large Graphs
"... We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ɛ)approximation of the clustering coefficient, the transitivity coefficient, and of the number of bipartite cliques in a graph given as a stream of edges. Our methods can be extended to approximately count the ..."
Abstract
 Add to MetaCart
We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ɛ)approximation of the clustering coefficient, the transitivity coefficient, and of the number of bipartite cliques in a graph given as a stream of edges. Our methods can be extended to approximately count the number of occurences of fixed constantsize subgraphs. Our algorithms only require one pass over the input stream and their storage space depends only on structural parameters of the graphs, the approximation guarantee, and the confidence probability. For example, the algorithms to compute the clustering and transitivity coefficient depend on that coefficient but not on the size of the graph. Since many large social networks have small clustering and transitivity coefficient, our algorithms use space independent of the size of the input for these graphs. We implemented our algorithms and evaluated their performance on networks from different application domains. The sizes of the considered input graphs varied from about 8, 000 nodes and 40, 000 edges about 135 million nodes and more than 1 billion edges. For both algorithms we run experiments
Computing Clustering Coefficients in Data Streams
"... We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ǫ)approximation of the clustering coefficient, the transitivity coefficient, and of the number of bipartite cliques in a graph given as a stream of edges. Our methods can be extended to approximately count the ..."
Abstract
 Add to MetaCart
We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± ǫ)approximation of the clustering coefficient, the transitivity coefficient, and of the number of bipartite cliques in a graph given as a stream of edges. Our methods can be extended to approximately count the number of occurences of fixed constantsize subgraphs. Our algorithms only require one pass over the input stream and their storage space depends only on structural parameters of the graphs, the approximation guarantee, and the confidence probability. For example, the algorithms to compute the clustering and transitivity coefficient depend on that coefficient but not on the size of the graph. Since many large social networks have small clustering and transitivity coefficient, our algorithms use space independent of the size of the input for these graphs. We implemented our algorithms and evaluated their performance on networks from different application domains. The sizes of the considered input graphs varied from about 8, 000 nodes and 40, 000 edges to about 135 million nodes and more than 1 billion edges. For both algorithms we run experiments
ABSTRACT Fighting Against Two Adversaries: Page Migration in Dynamic Networks
"... Page migration is one of the fundamental subproblems in the framework of data management in networks. It occurs in a distributed network of processors sharing one indivisible memory page of size D, which is stored in one of the processors. During runtime, processors access unit size data items from ..."
Abstract
 Add to MetaCart
Page migration is one of the fundamental subproblems in the framework of data management in networks. It occurs in a distributed network of processors sharing one indivisible memory page of size D, which is stored in one of the processors. During runtime, processors access unit size data items from the page, and the system is allowed to move the page from one processor to another in order to minimize the total communication cost. This problem was considered in the online setting numerous times by many researchers, and some online algorithms were proven to achieve a cost within a constant factor of the optimal offline solution. However, all results were achieved under the assumption that the communication costs between processors were fixed during the execution of the whole process. In this paper we consider a model in which the communication
Results 1  10
of
16