Results 1  10
of
22
An Efficient Approach to Clustering in Large Multimedia Databases with Noise
, 1998
"... Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering highdimensional feature vectors and since multimedia data ..."
Abstract

Cited by 207 (9 self)
 Add to MetaCart
Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering highdimensional feature vectors and since multimedia databases often contain large amounts of noise. In this paper, we therefore introduce a new algorithm to clustering in large multimedia databases called DENCLUE (DENsitybased CLUstEring). The basic idea of our new approachis to model the overall point density analytically as the sum of influence functions of the data points. Clusters can then be identified by determining densityattractors and clusters of arbitrary shape can be easily described by a simple equation based on the overall density function. The advantages of our new approach are (1) it has a firm mathematical basis, (2) it has good clustering properties in data sets with large amounts of noise, (3) it allows a compact mathematical ...
Query By Image Example: The Candid Approach
, 1995
"... CANDID (Comparison Algorithm for Navigating Digital Image Databases) was developed to enable contentbased retrieval of digital imagery from large databases using a querybyexample methodology. A user provides an example image to the system, and images in the database that are similar to that exampl ..."
Abstract

Cited by 88 (1 self)
 Add to MetaCart
CANDID (Comparison Algorithm for Navigating Digital Image Databases) was developed to enable contentbased retrieval of digital imagery from large databases using a querybyexample methodology. A user provides an example image to the system, and images in the database that are similar to that example are retrieved. The development of CANDID was inspired by the Ngram approach to document fingerprinting, where a "global signature" is computed for every document in a database and these signatures are compared to one another to determine the similarity between any two documents. CANDID computes a global signature for every image in a database, where the signature is derived from various image features such as localized texture, shape, or color information. A distance between probability density functions of feature vectors is then used to compare signatures. In this paper, we present CANDID and highlight two results from our current research: subtracting a "background" signature from ever...
Optimal GridClustering: Towards Breaking the Curse of Dimensionality in HighDimensional Clustering
, 1999
"... Many applications require the clustering of large amounts of highdimensional data. Most clustering algorithms, however, do not work effectively and efficiently in highdimensional space, which is due to the socalled "curse of dimensionality". In addition, the highdimensional data often contains a ..."
Abstract

Cited by 85 (4 self)
 Add to MetaCart
Many applications require the clustering of large amounts of highdimensional data. Most clustering algorithms, however, do not work effectively and efficiently in highdimensional space, which is due to the socalled "curse of dimensionality". In addition, the highdimensional data often contains a significant amount of noise which causes additional effectiveness problems. In this paper, we review and compare the existing algorithms for clustering highdimensional data and show the impact of the curse of dimensionality on their effectiveness and efficiency. The comparison reveals that condensationbased approaches (such as BIRCH or STING) are the most promising candidates for achieving the necessary efficiency, but it also shows that basically all condensationbased approaches have severe weaknesses with respect to their effectiveness in highdimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid...
A Cost Model for Query Processing in HighDimensional Data Spaces
, 2000
"... During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressin ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography or molecular biology. An important research issue in the field of multimedia databases is similarity search in large data sets. Most current approaches addressing similarity search use the socalled feature approach which transforms important properties of the stored objects into points of a highdimensional space (feature vectors). Thus, the similarity search is transformed into a neighborhood search in the feature space. For the management of the feature vectors, multidimensional index structures are usually applied. The performance of query processing can be substantially improved by opti...
Fast nearest neighbor search in highdimensional space
 In ICDE
, 1998
"... Similarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient in high ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Similarity search in multimedia databases requires an efficient support of nearestneighbor search on a large set of highdimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearestneighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearestneighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for highdimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the Xtree (up to a factor of 4). 1Introduction An important research issue in the field of multimedia databases is the content based retrieval of similar multimedia objects such as images, text and videos [Alt+ 90]
Toward improved ranking metrics
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2000
"... AbstractÐIn many computer vision algorithms, a metric or similarity measure is used to determine the distance between two features. The Euclidean or SSD (sum of the squared differences) metric is prevalent and justified from a maximum likelihood perspective when the additive noise distribution is Ga ..."
Abstract

Cited by 42 (13 self)
 Add to MetaCart
AbstractÐIn many computer vision algorithms, a metric or similarity measure is used to determine the distance between two features. The Euclidean or SSD (sum of the squared differences) metric is prevalent and justified from a maximum likelihood perspective when the additive noise distribution is Gaussian. Based on real noise distributions measured from international test sets, we have found that the Gaussian noise distribution assumption is often invalid. This implies that other metrics, which have distributions closer to the real noise distribution, should be used. In this paper, we consider three different applications: contentbased retrieval in image databases, stereo matching, and motion tracking. In each of them, we experiment with different modeling functions for the noise distribution and compute the accuracy of the methods using the corresponding distance measures. In our experiments, we compared the SSD metric, the SAD (sum of the absolute differences) metric, the Cauchy metric, and the Kullback relative information. For several algorithms from the research literature which used the SSD or SAD, we showed that greater accuracy could be obtained by using the Cauchy metric instead. Index TermsÐMaximum likelihood, ranking metrics, contentbased retrieval, color indexing, stereo matching, motion tracking. 1
Dynamically optimizing highdimensional index structures
 In Proc. Int. Conf. on Extending Database Technology (EDBT
, 2000
"... Abstract. In highdimensional query processing, the optimization of the logical pagesize of index structures is an important research issue. Even very simple query processing techniques such as the sequential scan are able to outperform indexes which are not suitably optimized. Pagesize optimizati ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Abstract. In highdimensional query processing, the optimization of the logical pagesize of index structures is an important research issue. Even very simple query processing techniques such as the sequential scan are able to outperform indexes which are not suitably optimized. Pagesize optimization based on a cost model faces the problem, that the optimum not only depends on static schema information such as the dimension of the data space but also on dynamically changing parameters such as the number of objects stored in the database and the degree of clustering and correlation in the current data set. Therefore, we propose a method for adapting the page size of an index dynamically during insert processing. Our solution, called DABStree, uses a flat directory whose entries consist of an MBR, a pointer to the data page and the size of the data page. Before splitting pages in insert operations, a cost model is consulted to estimate whether the split operation is beneficial. Otherwise, the split is avoided and the logical pagesize is adapted instead. A similar rule applies for merging when performing delete operations. We present an algorithm for the management of data pages with varying pagesizes in an index and show that all restructuring operations are locally restricted. We show in our experimental evaluation that the DABS tree outperforms the Xtree by a factor up to 4.6 and the sequential scan by a factor up to 6.6. 1.
On optimizing nearest neighbor queries in highdimensional data spaces
 In Proceedings of 8th International Conference on Database Theory (ICDT
, 2001
"... Abstract. Nearestneighbor queries in highdimensional space are of high importance in various applications, especially in contentbased indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we p ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. Nearestneighbor queries in highdimensional space are of high importance in various applications, especially in contentbased indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in highdimensional space, which we apply to enhance the performance of highdimensional index structures. The model is based on new insights into effects occurring in highdimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in highdimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speedups of the query processing time when compared to traditional block sizes and to the linear scan. 1.
Efficiency Issues Related to Probability Density Function Comparison
 SPIE  Storage and Retrieval for Image and Video Databases
, 1996
"... The CANDID project (Comparison Algorithm for Navigating Digital Image Databases) employs probability density functions (PDFs) of localized feature information to represent the content of an image for search and retrieval purposes. A similarity measure between PDFs is used to identify database images ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The CANDID project (Comparison Algorithm for Navigating Digital Image Databases) employs probability density functions (PDFs) of localized feature information to represent the content of an image for search and retrieval purposes. A similarity measure between PDFs is used to identify database images that are similar to a userprovided query image. Unfortunately, signature comparison involving PDFs is a very timeconsuming operation. In this paper, we look into some efficiency considerations when working with PDFs. Since PDFs can take on many forms, we look into tradeoffs between accurate representation and efficiency of manipulation for several data sets. In particular, we typically represent each PDF as a Gaussian mixture (e.g. as a weighted sum of Gaussian kernels) in the feature space. We find that by constraining all Gaussian kernels to have principal axes that are aligned to the natural axes of the feature space, computations involving these PDFs are simplified. We can also constr...
Multilevel Color Histogram Representation of Color Images by Peaks
, 1999
"... This paper proposes the use of a vector of color histogram peaks as an efficient and effective way for many image indexing problems. It shows that histogram peaks are more stable than general histogram bins when there are variation of scale and/or scale. We also introduce the structure of a room rec ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper proposes the use of a vector of color histogram peaks as an efficient and effective way for many image indexing problems. It shows that histogram peaks are more stable than general histogram bins when there are variation of scale and/or scale. We also introduce the structure of a room recognition system which applies this indexing technique to omnidirectional images of rooms. Experimental results shows that using only peaks leads to significantly less time and storage demands an still provides recognition rates across a database of hundreds of rooms.