Results 1 - 10
of
18
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
A survey of information retrieval and filtering methods
, 1995
"... We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
We survey the major techniques for information retrieval. In the rst part, weprovide an overview of the traditional ones (full text scanning, inversion, signature les and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic indexing and neural networks).
Evaluation of Main Memory Join Algorithms for Joins with Subset Join Predicates
- In Proc. of the Conf. on Very Large Data Bases (VLDB
, 1997
"... Current data models like the NF 2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates two main memory algorithms to evaluate efficiently this kind of join. More specifically, ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Current data models like the NF 2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates two main memory algorithms to evaluate efficiently this kind of join. More specifically, we concentrate on subset predicates. 1 Introduction Since the invention of relational database systems, tremendous effort has been undertaken in order to develop efficient join algorithms. Starting from a simple nested-loop join algorithm, the first improvement was the introduction of the merge join [1]. Later, the hash join [2, 7] and its improvements [20, 23, 28, 39] became alternatives to the merge join. For overviews see [27, 37] and for a comparison between the sort-merge and hash joins see [13, 14]. A lot of effort has also been spent on parallelizing join algorithms based on sorting [10, 25, 26, 34] and hashing [6, 12, 36]. Another important research area is the development of inde...
Efficient similarity search for market basket data
- VLDB Journal
, 2002
"... Abstract. Several organizations have developed very large market basket databases for the maintenance of customer transactions. New applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket databases. In this paper, we propose a novel ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Abstract. Several organizations have developed very large market basket databases for the maintenance of customer transactions. New applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of the proposed scheme over the state-of-the-art method.
Improved Methods for Signature-Tree Construction
- The Computer Journal
, 2000
"... we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
we locate a number of reasons for this problem and propose several methods for node splitting and partial-tree restructuring, which lead to improved query-response times. We have implemented all methods and we present experimental results, which indicate that the proposed methods are superior in all cases to the standard one and up to 5-10 times better for medium and higher weights in inclusive (partial match) queries. Additionally, we have developed new functions for the performance estimation of signature trees which, in contrast to a previous estimation function, are able to take into account the outcome of different split methods and to provide more accurate estimation
The Advanced Uncertain Reasoning Architecture, AURA
- University of Canterbury
, 1995
"... The ADAM binary neural network which has been used for image analysis applications, is contructed around a central component termed a Correlation Matrix Memory (CMM). A recent reexamination of the CMM has led to development of the Advanced Uncertain Reasoning Architecture (AURA). AURA inherits many ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The ADAM binary neural network which has been used for image analysis applications, is contructed around a central component termed a Correlation Matrix Memory (CMM). A recent reexamination of the CMM has led to development of the Advanced Uncertain Reasoning Architecture (AURA). AURA inherits many useful characteristics from ADAM, but is intended for applications requiring the manipulation of symbolic knowledge. This paper shows how the AURA architecture has been developed from ADAM and explains its method of operation. The paper also outlines the use of AURA in symbolic processing applications, and highlights some of the ways in which the AURA approach is superior to other methods. 1 Introduction The ADAM neural network [Austin 1987] is a binary network used for image analysis applications which is contructed around a central component known as a Correlation Matrix Memory (CMM) network. Our recent work has been examining the CMM networks in ADAM and has developed a more thorough und...
Signature-based Structures for Objects with Set-valued Attributes
, 2002
"... Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Aiming at the efficient retrieval of objects with set-valued attributes, we introduce three variations of a new method in order to satisfy subset and superset queries. Our approach is to combine the advantages of two access methods, that of linear Hashing and of tree-shaped methods, on which other similar methods have been previously reported as well. Performance estimation analytical functions for each particular method are presented, followed by a thorough experimental comparison of all investigated structures, where analytical and experimental results deviate 10% on the average. Finally, the results of this performance evaluation are presented and discussed, clearly showing the superiority of the new methods reaching an improvement of up to 85%.
Hyperion: High Volume Stream Archival for Retrospective Querying
, 2006
"... Network monitoring systems that support data archiving and after-the-fact (retrospective) queries are useful for a multitude of purposes, such as anomaly detection and network and security forensics. Data archiving for such systems, however, is complicated by (a) data arrival rates, which may be hun ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Network monitoring systems that support data archiving and after-the-fact (retrospective) queries are useful for a multitude of purposes, such as anomaly detection and network and security forensics. Data archiving for such systems, however, is complicated by (a) data arrival rates, which may be hundreds of thousands of packets per second on a single link, and (b) the need for online indexing of this data to support retrospective queries. At these data rates, both common database index structures and general-purpose file systems perform poorly. This paper describes Hyperion, a system for archiving, indexing, and on-line retrieval of high-volume data streams. We employ a write-optimized stream file system for high-speed storage of simultaneous data streams, and a novel use of signature file indexes in a distributed multi-level index. We implement Hyperion on commodity hardware and conduct a detailed evaluation using synthetic data and real network traces. Our streaming file system, StreamFS, is shown to be fast enough to archive traces at over a million packets per second. The index allows queries over hours of data to complete in as little as 10-20 seconds, and the entire system is able to index and archive over 200,000 packets/sec while processing simultaneous on-line queries. 1
Incorporating String Search in a Hypertext System: User Interface and Signature File Design Issues
- HyperMedia
, 1990
"... : Hypertext systems provide an appealing mechanism for informally browsing databases by traversing selectable links. However, in many fact finding situations string search is an effective complement to browsing. This paper describes the application of the signature file method to achieve rapid and c ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
: Hypertext systems provide an appealing mechanism for informally browsing databases by traversing selectable links. However, in many fact finding situations string search is an effective complement to browsing. This paper describes the application of the signature file method to achieve rapid and convenient string search in small personal computer hypertext environments. The method has been implemented in a prototype, as well as in a commercial product. Performance data for search times and storage space are presented from a commercial hypertext database. User interface issues are then discussed. Experience with the string search interface indicates that it was used sucessfully by novice users. Address correspondence to: Christos Faloutsos Computer Science Department, University of Maryland, College Park, MD 20742, USA Tel: 1 (301) 454-1462 -- Fax: 1 (301) 454-8346 -- Email: christos@cs.umd.edu 1 Introduction Early exploratory hypertext systems are giving way to numerous commercial...
Information Retrieval on the Web: Selected Topics
- IBM research, Tokyo Research Laboratory, IBM
, 1999
"... In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. Although the numerical gures vary, the overall trends cited by the sources are consistent and point to exponential growth during the coming decade. And Internet users are increasingly using search engines and search services to nd speci c information of interest. However, users are not satis ed with the performance of the current generation of search engines; the slow speed of retrieval, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. The main body of our paper focuses on linear algebraic models and techniques for solving these problems. keywords: clustering, indexing, information retrieval, Internet, late...

