On PowerLaw Relationships of the Internet Topology
 IN SIGCOMM
, 1999
"... Despite the apparent randomness of the Internet, we discover some surprisingly simple powerlaws of the Internet topology. These powerlaws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period. We show that our powerl ..."
Cited by 1670 (70 self)
Despite the apparent randomness of the Internet, we discover some surprisingly simple powerlaws of the Internet topology. These powerlaws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period. We show that our powerlaws fit the real data very well resulting in correlation coefficients of 96% or higher. Our observations provide a novel perspective of the structure of the Internet. The powerlaws describe concisely skewed distributions of graph properties such as the node outdegree. In addition, these powerlaws can be used to estimate important parameters such as the average neighborhood size, and facilitate the design and the performance analysis of protocols. Furthermore, we can use them to generate and select realistic topologies for simulation purposes.
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Cited by 686 (3 self)
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region).
A quantitative analysis and performance study for similaritysearch methods in high dimensional spaces, in:
 Proceedings of the 24th VLDB International Conference on Very Large Data Bases,
, 1998
Searching in metric spaces
, 2001
"... The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather gen ..."
Cited by 436 (38 self)
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without crossknowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of “intrinsic dimensionality. ” We also present a unified
When Is "Nearest Neighbor" Meaningful?
 In Int. Conf. on Database Theory
, 1999
"... . We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance ..."
Cited by 408 (2 self)
. We explore the effect of dimensionality on the "nearest neighbor " problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 1015 dimensions. These results should not be interpreted to mean that highdimensional indexing is never meaningful; we illustrate this point by identifying some highdimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate highdimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple...
Generalized Search Trees for Database Systems
 IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
Cited by 237 (18 self)
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+trees and Rtrees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+tree, an Rtree, and an RDtree, a new index for data with setvalued attributes. We also present a preliminary performance analysis of RDtrees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
MindReader: Querying databases through multiple examples
 In Proc. of the 24 th VLDB Conference
, 1998
"... Users often can not easily express their queries. For example, in a multimedia/image by content setting, the user might want photographs with sunsets; in current systems, like QBIC, the user has to give a sample query, and to specify the relative importance of color, shape and texture. Even worse, t ..."
Cited by 211 (2 self)
Users often can not easily express their queries. For example, in a multimedia/image by content setting, the user might want photographs with sunsets; in current systems, like QBIC, the user has to give a sample query, and to specify the relative importance of color, shape and texture. Even worse, the user might want correlations between attributes, like, for example, in a traditional, medical record database, a medical researcher might want to find "mildly overweight patients", where the implied query would be "weight/height &asymp; 4 lb/inch". Our goal is to provide a userfriendly, but theoretically solid method, to handle such queries. We allow the user to give several examples, and, optionally, their 'goodness' scores, and we propose a novel method to "guess" which attributes are important, which correlations are important, and with what weight. Our contributions are twofold: (a) we formalize the problem as a minimization problem and show how to solve for the optimal solution, completely av...
A Model for the Prediction of Rtree Performance
, 1996
"... In this paper we present an analytical model that predicts the performance of Rtrees (and its variants) when a range query needs to be answered. The cost model uses knowledge of the dataset only, i.e., the proposed formula that estimates the number of disk accesses is a function of data properties ..."
Cited by 168 (21 self)
In this paper we present an analytical model that predicts the performance of Rtrees (and its variants) when a range query needs to be answered. The cost model uses knowledge of the dataset only, i.e., the proposed formula that estimates the number of disk accesses is a function of data properties, namely, the amount of data and their density in the work space. In other words, the proposed model is applicable even before the construction of the Rtree index, a fact that makes it a useful tool for dynamic spatial databases. Several experiments on synthetic and real datasets show that the proposed analytical model is very accurate, the relative error being usually around 10%15%, for uniform and nonuniform distributions. We believe that this error is involved with the gap between efficient Rtree variants, like the R*tree, and an optimum, not implemented yet, method. Our work extends previous research concerning Rtree analysis and constitutes a useful tool for spatial query optimiz...
SimilarityBased Queries for Time Series Data
 Proc. 1997 ACMSIGMOD Conf
, 1997
"... We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a q ..."
Cited by 158 (6 self)
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a query processing algorithm that uses the underlying Rtree index of a multidimensional data set to answer similarity queries efficiently. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We relate our transformations to the general framework for similarity queries of Jagadish et al. 1
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
, 1995
"... We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals" ..."
Cited by 125 (18 self)
We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with noninteger (fractal) dimension. In this paper we show that, among the infinite family of fractal dimensions, the so called "Correlation Dimension" D 2 is the one that we need to predict the selectivity of spatial join. The main contribution is that, for all the real and synthetic pointsets we tried, the average number of neighbors for a given point of the pointset follows a power law, with D 2 as the exponent. This immediately solves the selectivity estimation for spatial joins, as well as for "biased" range queries (i.e., queries whose centers prefer areas of high point density). We present the formulas to estimate the selectivity for the biased queries, including an integration constant (K `shape 0 ) for ea...