Results 1  10
of
151
On PowerLaw Relationships of the Internet Topology
 In SIGCOMM
, 1999
"... Despite the apparent randomness of the Internet, we discover some surprisingly simple powerlaws of the Internet topology. These powerlaws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period. We show that our powerl ..."
Abstract

Cited by 1247 (61 self)
 Add to MetaCart
Despite the apparent randomness of the Internet, we discover some surprisingly simple powerlaws of the Internet topology. These powerlaws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period. We show that our powerlaws fit the real data very well resulting in correlation coefficients of 96% or higher. Our observations provide a novel perspective of the structure of the Internet. The powerlaws describe concisely skewed distributions of graph properties such as the node outdegree. In addition, these powerlaws can be used to estimate important parameters such as the average neighborhood size, and facilitate the design and the performance analysis of protocols. Furthermore, we can use them to generate and select realistic topologies for simulation purposes. 1 Introduction "What does the Internet look like?" "Are there any topological properties that don't change in time?" "How will it look like a year ...
Fast Subsequence Matching in TimeSeries Databases
 SIGMOD 94
, 1994
"... We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract

Cited by 430 (21 self)
 Add to MetaCart
We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
On Packing Rtrees
 In ACM CIKM
, 1993
"... – main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on ..."
Abstract

Cited by 220 (16 self)
 Add to MetaCart
– main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on disk, to answer spatial queries (range, nn, etc) 15721 Copyright: C. Faloutsos (2001) 3 1 (Who cares?)
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 193 (7 self)
 Add to MetaCart
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Hilbert Rtree: An improved Rtree using fractals
, 1994
"... We propose a new Rtree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in Rtrees. This is done by proposing an ordering on the Rtree nodes. This ordering has to be 'good', in the sense that it should group 'similar' data rectan ..."
Abstract

Cited by 184 (9 self)
 Add to MetaCart
We propose a new Rtree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in Rtrees. This is done by proposing an ordering on the Rtree nodes. This ordering has to be 'good', in the sense that it should group 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Following [19] we have chosen the socalled '2Dc' method, which sorts rectangles according to the Hilbert value of the center of the rectangles. Given the ordering, every node has a welldefined set of sibling nodes; thus, we can use deferred splitting. By adjusting the split policy, the Hilbert Rtree can achieve as high utilization as desired. To the contrary, the R tree has no control over the space utilization, typically achieving up to 70%. We designed the manipulation algorithms in detail, and we did a full implementation of the Hilbert Rtree. Our experiments show that the '2to...
Beyond uniformity and independence: Analysis of rtrees using the concept of fractal dimension
 In Proc. PODS
, 1994
"... We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA’s InfraredUltraviolet Explorer etc.) we provide evidence that re ..."
Abstract

Cited by 154 (19 self)
 Add to MetaCart
We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA’s InfraredUltraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, noninteger fract al dimension. Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the Rtrees. We provide the jirst analysis of Rtrees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%. We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multiattribute queries. 1
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
, 1995
"... We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with noninteger (fractal) d ..."
Abstract

Cited by 121 (16 self)
 Add to MetaCart
We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with noninteger (fractal) dimension. In this paper we show that, among the infinite family of fractal dimensions, the so called "Correlation Dimension" D 2 is the one that we need to predict the selectivity of spatial join. The main contribution is that, for all the real and synthetic pointsets we tried, the average number of neighbors for a given point of the pointset follows a power law, with D 2 as the exponent. This immediately solves the selectivity estimation for spatial joins, as well as for "biased" range queries (i.e., queries whose centers prefer areas of high point density). We present the formulas to estimate the selectivity for the biased queries, including an integration constant (K `shape 0 ) for ea...
PowerLaws and the ASlevel Internet Topology
 IEEE/ACM Transactions on Networking
, 2003
"... In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with powerlaws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet struc ..."
Abstract

Cited by 87 (10 self)
 Add to MetaCart
In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with powerlaws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet structure. Second, we show that powerlaws appear consistently over the last 5 years. We also observe that the powerlaws hold even in the most recent and more complete topology [10] with correlation coefficient above 99% for the degree powerlaw. In addition, we study the evolution of the powerlaw exponents over the 5 year interval and observe a variation for the degree based powerlaw of less than 10%. Third, we provide relationships between the exponents and other topological metrics.
Declustering Using Fractals
 In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
, 1993
"... We propose a method to achieve declustering for cartesian product files on M units. The focus is on range queries, as opposed to partial match queries that older declustering methods have examined. Our method uses a distancepreserving mapping, namely, the Hilbert curve, to impose a linear ordering ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
We propose a method to achieve declustering for cartesian product files on M units. The focus is on range queries, as opposed to partial match queries that older declustering methods have examined. Our method uses a distancepreserving mapping, namely, the Hilbert curve, to impose a linear ordering on the multidimensional points (buckets); then, it traverses the buckets according to this ordering, assigning buckets to disks in a roundrobin fashion. Thanks to the good distancepreserving properties of the Hilbert curve, the end result is that each disk contains buckets that are far away in the linear ordering, and, most probably, far away in the kd address space. This is exactly the goal of declustering. Experiments show that these intuitive arguments lead indeed to good performance: the proposed method performs at least as well or better than older declustering schemes. Categories and Subject Descriptors: E.1 [Data Structures]; E.5 [Files]; H.2.2 [Data Base Management]: Physical Des...
Efficient Retrieval of Similar Time Sequences Using DFT
 Proc. Int’l Conf. Foundations of Data Organizations and Algorithms
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1