Results 1 - 10
of
11
The grid file: An adaptable, symmetric multikey file structure
- ACM Transactions on Database Systems
, 1984
"... Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects o ..."
Abstract
-
Cited by 362 (4 self)
- Add to MetaCart
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects of tile structures that treat all keys symmetrically, that is, file structures which avoid the distinction between primary and secondary keys. We start from a bitmap approach and treat the problem of file design as one of data compression of a large sparse matrix. This leads to the notions of a grid partition of the search space and of a grid directory, which are the keys to a dynamic file structure called the grid file. This tile system adapts gracefully to its contents under insertions and deletions, and thus achieves an upper hound of two disk accesses for single record retrieval; it also handles range queries and partially specified queries efficiently. We discuss in detail the design decisions that led to the grid file, present simulation results of its behavior, and compare it to other multikey access file structures.
Evaluation of Signature Files as Set Access Facilities in OODBs
- In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data
, 1993
"... Object-oriented database systems (OODBs) need efficient support for manipulation of complex objects. In particular, support of queries involving evaluations of set predicates is often required in handling complex objects. In this paper, we propose a scheme to apply signature file techniques, which w ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Object-oriented database systems (OODBs) need efficient support for manipulation of complex objects. In particular, support of queries involving evaluations of set predicates is often required in handling complex objects. In this paper, we propose a scheme to apply signature file techniques, which were originally invented for text retrieval, to the support of set value accesses, and quantitatively evaluate their potential capabilities. Two signature file organizations, the sequential signature file and the bitsliced signature file, are considered and their performance is compared with that of the nested index for queries involving the set inclusion operator (`). We develop a detailed cost model and present analytical results clarifying their retrieval, storage, and update costs. Our analysis shows that the bitsliced signature file is a very promising set access facility in OODBs. 1 INTRODUCTION Advanced database application areas, such as computer aided design, office automation, and...
Improved Methods for Signature-Tree Construction
- The Computer Journal
, 2000
"... we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all cases to the standard one and up to 5-10 times better for medium and higher weights in inclusive (partial match) queries. Additionally, we have developed new functions for the performance estimation of signature trees which, in contrast to a previous estimation function, are able to take into account the outcome of different split methods and to provide more accurate estimation
Spatial Similarity-Based Retrievals and Image Indexing By Hierarchical Decomposition
- Proceedings of the International Database Engineering and Application Symposium (IDEAS’97
, 1997
"... For efficient search and spatial similarity-based retrieval of image contents, this paper introduces a new symbolic image representation and indexing technique. In this technique, an image is recursively decomposed into a spatial arrangement of features points while preserving the spatial relationsh ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
For efficient search and spatial similarity-based retrieval of image contents, this paper introduces a new symbolic image representation and indexing technique. In this technique, an image is recursively decomposed into a spatial arrangement of features points while preserving the spatial relationships among its various component. Quadtrees are used to manage the decomposition hierarchy and help in quantifying the measure of similarity. This scheme is incremental in nature and can be adopted to find a match at various levels of details, from coarse to fine. This approach is translation, rotation and scale independent. For search and retrieval, a two phase indexing scheme based on image signatures and quadtree matching is introduced. For a given query image, a facility is provided to rank order the retrieved spatially similar images from the image database against a given query image for subsequent browsing and user selection. Keywords: Image Databases, Symbolic Image Representation, I...
Signature-based Structures for Objects with Set-valued Attributes
, 2002
"... Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other similar methods have been previously reported as well. Performance estimation analytical functions for each particular method are presented, followed by a thorough experimental comparison of all investigated structures, where analytical and experimental results deviate 10% on the average. Finally, the results of this performance evaluation are presented and discussed, clearly showing the superiority of the new methods reaching an improvement of up to 85%.
A superimposed coding scheme based on multiple block descriptor files for indexing very large databases
- In Proc. 14 conf. VLDB
, 1988
"... A new signature file method for accessing information from large data files containing both formatted and free text data is presented. The new method, called the multi-organizational scheme is proposed for indexing very large data files containing hundreds of thousands or possibly mil-lions of recor ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A new signature file method for accessing information from large data files containing both formatted and free text data is presented. The new method, called the multi-organizational scheme is proposed for indexing very large data files containing hundreds of thousands or possibly mil-lions of records. 1.
Massive Parallelism on the Hybrid Text Retrieval Machine
- Machine”, Information Processing & Management, Vol.31, No.6
, 1995
"... The design of a high-performance, cost-effective, machine for retrieving textual data is discussed in this paper. High performance and cost effectiveness are achieved by a combination of low-cost hard disks, software filtering techniques, and a large amount of main memory. The discussion focuses on ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The design of a high-performance, cost-effective, machine for retrieving textual data is discussed in this paper. High performance and cost effectiveness are achieved by a combination of low-cost hard disks, software filtering techniques, and a large amount of main memory. The discussion focuses on the signature processor, which is based on the partitioned signature file technique, and the mass storage system, which is based on a disk array. A performance evaluation on the individual system components, namely, the signature processor and the mass storage system, as well as the entire system is presented. *The author is on leave from the Department of Computer and Information Science, The Ohio State University, Columbus, OH 43210. 1 Introduction Information retrieval has been a very important application for computers. However, the massive amount of information handled by information retrieval applications such as library systems and office automation systems often overwhelms the lar...
Coder Lexicon: The Collins English Dictionary and its Adverb Definitions
, 1986
"... The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of “composite documents.” In order to support so ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of “composite documents.” In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
The Design of Text Signatures for Text Retrieval Systems
, 1994
"... Signature files are one technique for indexing documents for full-text retrieval systems. This paper discusses two methods for generating text signatures -- the word fragmentation and the pseudo-random generation techniques. The paper evaluates the effectiveness and efficiency of generating text sig ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Signature files are one technique for indexing documents for full-text retrieval systems. This paper discusses two methods for generating text signatures -- the word fragmentation and the pseudo-random generation techniques. The paper evaluates the effectiveness and efficiency of generating text signatures using these techniques. It also determines the optimal set of characteristics that define a text signature that is to be used for superimposed signature file indexes. The optimal set of characteristics can be used to create text signatures that minimise the number of false drops retrieved from the information system. Keywords Full-text retrieval; Searching; Signature Files; Superimposed coding; Text retrieval systems; Text signatures. Page 1 1. Introduction A text retrieval system is characterised by two components. The text database consists of a collection of text documents. The documents can either be unstructured (that is, devoid of any of the traditional database field str...
Efficient Signature File Methods for Text Retrieval
- IEEE Trans. Knowledge and Data Eng
, 1995
"... Signature files have been studied extensively as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Signature files have been studied extensively as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their performance. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signature file approach. The contribution of this paper is two-fold. First, we present a uniform analytical performance model so that the methods can be compared fairly and consistently. The analysis shows that the two-level superimposed coding scheme, if stored in a transposed file, has the best performance. Second, we extend the two-level superimposed coding method into a multi-level superimposed coding method, we obtain the optimal number of levels for the multi-level method and show that for databases with ...

