Results 1 - 10
of
28
A survey of information retrieval and filtering methods
, 1995
"... We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic indexing and neural networks).
An evaluation of generic bulk loading techniques
, 2001
"... Bulk loading refers to the process of creating an index from scratch for a given data set. This problem is well understood for B-trees, but so far, non-traditional index structures received modest attention. We are particularly interested in fast generic bulk loading techniques whose implementations ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Bulk loading refers to the process of creating an index from scratch for a given data set. This problem is well understood for B-trees, but so far, non-traditional index structures received modest attention. We are particularly interested in fast generic bulk loading techniques whose implementations only employ a small interface that is satisfied by a broad class of index structures. Generic techniques are very attractive to extensible database systems since different user-implemented index structures implementing that small interface can be bulk-loaded without any modification of the generic code. The main contribution of the paper is the proposal of two new generic and conceptually simple bulk loading algorithms. These algorithms recursively partition the input by using a main-memory index of the same type as the target index to be build. In contrast to previous generic bulk loading algorithms, the implementation of our new algorithms turns out to be much easier. Another advantage is that our new algorithms possess fewer parameters whose settings have to be taken into consideration. An experimental performance comparison is presented where different bulk loading algorithms are investigated in a system-like scenario. Our experiments are unique in the sense that we examine the same code for different index structures (R-tree and Slim-tree). The results consistently indicate that our new algorithms outperform asymptotically worst-case optimal competitors. Moreover, the search quality of the target index will be better when our new bulk loading algorithms are used. *This work has been supported by grant no. SE 553/2-1 from DFG.
Improved Methods for Signature-Tree Construction
- The Computer Journal
, 2000
"... we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all cases to the standard one and up to 5-10 times better for medium and higher weights in inclusive (partial match) queries. Additionally, we have developed new functions for the performance estimation of signature trees which, in contrast to a previous estimation function, are able to take into account the outcome of different split methods and to provide more accurate estimation
Similarity search in sets and categorical data using the signature tree
- In ICDE
, 2003
"... Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods for multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries. 1
Efficient Content-based Indexing of Large Image Databases
- ACM Transactions on Information Systems
, 2000
"... Large image databases have emerged in various applications in recent years. A prime requisite of these databases is the means by which their contents can be indexed and retrieved. A multilevel signature file called the Two Signature Multi-Level Signature File (2SMLSF) is introduced as an efficient a ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Large image databases have emerged in various applications in recent years. A prime requisite of these databases is the means by which their contents can be indexed and retrieved. A multilevel signature file called the Two Signature Multi-Level Signature File (2SMLSF) is introduced as an efficient access structure for large image databases. The 2SMLSF encodes image information into binary signatures and creates a tree structure that can be efficiently searched to satisfy a user’s query. Two types of signatures are generated. Type I signatures are used at all tree levels except the leaf level and are based only on the domain objects included in the image. Type II signatures, on the other hand, are stored at the leaf level and are based on the included domain objects and their spatial relationships. The 2SMLSF was compared analytically to existing signature file techniques. The 2SMLSF significantly reduces the storage requirements; the index structure can answer more queries; and the 2SMLSF performance significantly improves over current techniques. Both storage reduction and performance improvement increase with the number of objects per image and the number of images in the database. For an example large image databases, a storage reduction of 78 % may be achieved while the performance improvement may reach 98%.
Hierarchical bitmap index: An efficient and scalable indexing technique for set-valued attributes
- In Proc. ADBIS’03
, 2003
"... Abstract. Set-valued attributes are convenient to model complex objects occurring in the real world. Currently available database systems support the storage of set-valued attributes in relational tables but contain no primitives to query them efficiently. Queries involving set-valued attributes eit ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract. Set-valued attributes are convenient to model complex objects occurring in the real world. Currently available database systems support the storage of set-valued attributes in relational tables but contain no primitives to query them efficiently. Queries involving set-valued attributes either perform full scans of the source data or make multiple passes over single-value indexes to reduce the number of retrieved tuples. Existing techniques for indexing set-valued attributes (e.g., inverted files, signature indexes or RD-trees) are not efficient enough to support fast access of set-valued data in very large databases. In this paper we present the hierarchical bitmap index—a novel technique for indexing set-valued attributes. Our index permits to index sets of arbitrary length and its performance is not affected by the size of the indexed domain. The hierarchical bitmap index efficiently supports different classes of queries, including subset, superset and similarity queries. Our experiments show that the hierarchical bitmap index outperforms other set indexing techniques significantly. 1
Signature-based Structures for Objects with Set-valued Attributes
, 2002
"... Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other similar methods have been previously reported as well. Performance estimation analytical functions for each particular method are presented, followed by a thorough experimental comparison of all investigated structures, where analytical and experimental results deviate 10% on the average. Finally, the results of this performance evaluation are presented and discussed, clearly showing the superiority of the new methods reaching an improvement of up to 85%.
Incorporating String Search in a Hypertext System: User Interface and Signature File Design Issues
- HyperMedia
, 1990
"... : Hypertext systems provide an appealing mechanism for informally browsing databases by traversing selectable links. However, in many fact finding situations string search is an effective complement to browsing. This paper describes the application of the signature file method to achieve rapid and c ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
: Hypertext systems provide an appealing mechanism for informally browsing databases by traversing selectable links. However, in many fact finding situations string search is an effective complement to browsing. This paper describes the application of the signature file method to achieve rapid and convenient string search in small personal computer hypertext environments. The method has been implemented in a prototype, as well as in a commercial product. Performance data for search times and storage space are presented from a commercial hypertext database. User interface issues are then discussed. Experience with the string search interface indicates that it was used sucessfully by novice users. Address correspondence to: Christos Faloutsos Computer Science Department, University of Maryland, College Park, MD 20742, USA Tel: 1 (301) 454-1462 -- Fax: 1 (301) 454-8346 -- Email: christos@cs.umd.edu 1 Introduction Early exploratory hypertext systems are giving way to numerous commercial...
PlugJoin: An easy-to-use generic algorithm for efficiently processing equi and non-equi joins
"... This paper presents Plug&Join, a new generic algorithm for efficiently processing a broad class of different types of joins in an extensible database system. Plug&Join is not only designed to support equi joins, temporal joins, spatial joins, subset joins and other types of joins, but in contrast to ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents Plug&Join, a new generic algorithm for efficiently processing a broad class of different types of joins in an extensible database system. Plug&Join is not only designed to support equi joins, temporal joins, spatial joins, subset joins and other types of joins, but in contrast to previous algorithms it can be easily customized and it allows efficient processing of new types of joins that might be of relevance in the near future. Depending on the join predicate (and the data types of the join relations) Plug&Join is called with a suitable type of index structure as a parameter. Fortunately, custom types of index structures can be implemented easily under frameworks like GiST which simplifies and extends the applicability of our approach. Plug&Join partitions both join relations recursively until each partition of the inner relation fits in main memory. If an inner partition fits in memory, the algorithm builds a memory resident index of the desired type on the inner...
Information Retrieval on the Web: Selected Topics
- IBM research, Tokyo Research Laboratory, IBM
, 1999
"... In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. Although the numerical gures vary, the overall trends cited by the sources are consistent and point to exponential growth during the coming decade. And Internet users are increasingly using search engines and search services to nd speci c information of interest. However, users are not satis ed with the performance of the current generation of search engines; the slow speed of retrieval, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. The main body of our paper focuses on linear algebraic models and techniques for solving these problems. keywords: clustering, indexing, information retrieval, Internet, late...

