Results 1  10
of
18
FuzzyFingerprint for TextBased Information Retrieval
 Proceedings of IKnow ’05. Austria: Maurer Tochtermann
, 2005
"... Abstract: This paper introduces a particular form of fuzzyfingerprints—their construction, their interpretation, and their use in the field of information retrieval. Though the concept of fingerprinting in general is not new, the way of using them within a similarity search as described here is: In ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
Abstract: This paper introduces a particular form of fuzzyfingerprints—their construction, their interpretation, and their use in the field of information retrieval. Though the concept of fingerprinting in general is not new, the way of using them within a similarity search as described here is: Instead of computing the similarity between two fingerprints in order to access the similarity between the associated objects, simply the event of a fingerprint collision is used for a similarity assessment. The main impact of this approach is the small number of comparisons necessary to conduct a similarity search.
Nearest Neighbor Search in Multidimensional Spaces
, 1999
"... The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, t ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, then we can relax the problem to the approximate Nearest Neighbor Search. Nearest Neighbor Search (exact or approximate) is an integral component in a wide range of applications that include multimedia databases, computational biology, data mining, and information retrieval. The common thread in all these applications is similarity search: given a database of objects, we want to return the object in the database that is most similar to a query object. The objects are mapped onto points in a high dimensional metric space , and similarity search reduces to a nearest neighbor search. The dimension of the underlying space may be in the order of a few hundreds, or thousands; therefore, we r...
Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases
 In Third IEEE Symposium on BioInformatics and BioEngineering (BIBE’03
, 2003
"... We present a multidimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distanc ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We present a multidimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distance functions in the transformed domain and examine properties of these functions. We experimentally compared their (a) approximation quality for kNearest Neighbor (kNN) queries, (b) pruning ability and (c) approximation quality for erange queries. Results for kNN queries, which we present here, show that our proposed distances FD2 and WD2 (i.e. Frequency and Wavelet Distance functions for 2grams) perform significantly better than the others'. We then develop effective index structures, based on Rtrees and scalar quantization, on top of transformed vectors' and distance functions. Promising results from the experiments on real biosequence data sets are presented.
Metric Indexing for the Vector Model in Text Retrieval
 In SPIRE, Padova, Italy. LNCS 3246
, 2004
"... Abstract. In the area of Text Retrieval, processing a query in the vector model has been verified to be qualitatively more effective than searching in the boolean model. However, in case of the classic vector model the current methods of processing manyterm queries are inefficient, in case of LSI m ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In the area of Text Retrieval, processing a query in the vector model has been verified to be qualitatively more effective than searching in the boolean model. However, in case of the classic vector model the current methods of processing manyterm queries are inefficient, in case of LSI model there does not exist an efficient method for processing even the fewterm queries. In this paper we propose a method of vector query processing based on metric indexing, which is efficient especially for the LSI model. In addition, we propose a concept of approximate semimetric search, which can further improve the efficiency of retrieval process. Results of experiments made on moderate text collection are included. 1
Kernel Vector Approximation Files for Relevance Feedback Retrieval in Large Image Databases
 in Large Image Databases,” Multimedia Tools and Applications, vol 25., N
, 2004
"... space and do not support relevance feedback retrieval. The vector approximation file (VAFile) approach overcomes some of the di#culties of high dimensional vector spaces, but cannot be applied to relevance feedback retrieval using kernel distances in the data measurement space. This paper intro ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
space and do not support relevance feedback retrieval. The vector approximation file (VAFile) approach overcomes some of the di#culties of high dimensional vector spaces, but cannot be applied to relevance feedback retrieval using kernel distances in the data measurement space. This paper introduces a novel KVAFile (kernel VAFile) that extends VAFile to kernelbased retrieval methods. An e#cient approach to approximating vectors in an induced feature space is presented with the corresponding upper and lower distance bounds. Thus an e#ective indexing method is provided for kernelbased relevance feedback image retrieval methods.
Indexing and Selection of Data Items in Huge Data Sets by Constructing and Accessing Tag Collections
"... We present here a new way of indexing and retrieving data in huge datasets having a high dimensionality. The proposed method speeds up the selecting process by replacing scans of the whole data by scans of matching data. It makes use of two levels of catalogs that allow efficient data preselections. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present here a new way of indexing and retrieving data in huge datasets having a high dimensionality. The proposed method speeds up the selecting process by replacing scans of the whole data by scans of matching data. It makes use of two levels of catalogs that allow efficient data preselections. First level catalogs only contain a small subset of the data items selected according to given criteria. The first level catalogs allow to carry out queries and to preselect items. Then, a refined query can be carried out on the preselected data items within the full dataset. A second level catalog maintains the list of existing first level catalogs and the type and kind of data items they are storing.
Extending an IndexBenchmarking Framework with NonInvasive Visualization Capability
"... Abstract: Finding a suitable multidimensional index structure for a dataintensive system is not a trivial task. The QuEval framework supports users in finding the best index structure from a list of candidates. Nevertheless, if an index structure shows itself superior to other index structures mos ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract: Finding a suitable multidimensional index structure for a dataintensive system is not a trivial task. The QuEval framework supports users in finding the best index structure from a list of candidates. Nevertheless, if an index structure shows itself superior to other index structures most oft the times, but fails for one data set, we want to know the reason for this phenomenon. To support an understanding of deficits, a visualization of the partitioning scheme is helpful. Consequently, we propose a visualization component which interacts with QuEval without affecting the performance evaluation. Thus, we use a modern softwareengineering approach based on AspectJ to support Digital Engineering of complex solutions. 1
Challenges in Finding an Appropriate MultiDimensional Index Structure with Respect to Specific Use Cases
"... In recent years, index structures for managing multidimensional data became increasingly important. Due to heterogeneous systems and specific use cases, it is a complex challenge to find an appropriate index structure for specific problems, such as finding similar fingerprints or micro traces in a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In recent years, index structures for managing multidimensional data became increasingly important. Due to heterogeneous systems and specific use cases, it is a complex challenge to find an appropriate index structure for specific problems, such as finding similar fingerprints or micro traces in a database. One aspect that should be considered in general is the dimensionality and the related curse of dimensionality. However, dimensionality of data is just one component that have to be considered. To address the challenges of finding the appropriate index, we motivate the necessity of a framework to evaluate indexes for specific use cases. Furthermore, we discuss core components of a framework that supports users in finding the most appropriate index structure for their use case.
Queval: Beyond highdimensional indexing à la carte
 PVLDB
, 2013
"... In the recent past, the amount of highdimensional data, such as feature vectors extracted from multimedia data, increased dramatically. A large variety of indexes have been proposed to store and access such data efficiently. However, due to specific requirements of a certain use case, choosing an a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In the recent past, the amount of highdimensional data, such as feature vectors extracted from multimedia data, increased dramatically. A large variety of indexes have been proposed to store and access such data efficiently. However, due to specific requirements of a certain use case, choosing an adequate index structure is a complex and timeconsuming task. This may be due to engineering challenges or open research questions. To overcome this limitation, we present QuEval, an opensource framework that can be flexibly extended w.r.t. index structures, distance metrics, and data sets. QuEval provides a unified environment for a sound evaluation of different indexes, for instance, to support tuning of indexes. In an empirical evaluation, we show how to apply our framework, motivate benefits, and demonstrate analysis possibilities.
PrivacyAware Multidimensional Indexing
"... Abstract: Deleting data from a database system in a forensic secure environment and in a high performant way is a complex challenge. Due to redundant copies and additional information stored about data items, it is not appropriate to delete only data items themselves. Additional challenges arise whe ..."
Abstract
 Add to MetaCart
Abstract: Deleting data from a database system in a forensic secure environment and in a high performant way is a complex challenge. Due to redundant copies and additional information stored about data items, it is not appropriate to delete only data items themselves. Additional challenges arise when using multidimensional index structures. This is because information of data items are used to index the space. As initial result, we present different deletion levels, to overcome this challenge. Based on this classification, we analyze how data can be reconstructed from the index and modify index structures to improve privacy of data items. Second, we benchmark our index structure modifications and quantify our modifications. Our results indicate that forensic secure deletion is possible with modification of multidimensional index structures having only a small impact on computational performance, in some cases. 1