Results 1 - 10
of
58
Efficient and Effective Clustering Methods for Spatial Data Mining
, 1994
"... Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role to play in spatial data mining. To this end, we develop a new clustering method called CLARANS which ..."
Abstract
-
Cited by 503 (35 self)
- Add to MetaCart
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role to play in spatial data mining. To this end, we develop a new clustering method called CLARANS which is based on randomized search. We also de- velop two spatial data mining algorithms that use CLARANS. Our analysis and experiments show that with the assistance of CLARANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.
Query Optimization
, 1996
"... Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the typ ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time? Such query optimization is absolutely necessary in a DBMS. The cost difference between two alternatives can be enormous. For example, consider the following database schema, which will be...
The LDL System Prototype
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... The LDL system provides a declarative logic-based language and integrates relational database and logic programming technologies so as to support advanced data and knowledge-based applications. This paper contains a comprehensive overview of the system and contains a description of the LDL language ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
The LDL system provides a declarative logic-based language and integrates relational database and logic programming technologies so as to support advanced data and knowledge-based applications. This paper contains a comprehensive overview of the system and contains a description of the LDL language and the compilation techniques employed to translate LDL queries into target queries on the stored data. The paper further contains a description of the architecture and runtime environment of the system and the optimization techniques employed in order to improve the performance and assure the safety of the compiled queries. The paper concludes with an account of the experience gained so far with the system, and discusses application areas where the LDL approach appears to be particularly effective.
Queries and query processing in object-oriented database systems
- ACM Transactions on Information Systems
, 1990
"... One of the basic functionalities of database management systems (DBMSs) is to be able to process declarative user queries. The first generation of object-oriented DBMSs did not provide declarative query capabilities. However, the last decade has seen significant research in defining query models (in ..."
Abstract
-
Cited by 75 (8 self)
- Add to MetaCart
One of the basic functionalities of database management systems (DBMSs) is to be able to process declarative user queries. The first generation of object-oriented DBMSs did not provide declarative query capabilities. However, the last decade has seen significant research in defining query models (including calculi, algebra and user languages) and in techniques for processing and optimizing them. Many of the current commercial systems provide at least rudimentary query capabilities. In this chapter we discuss the techniques that have been developed for processing object-oriented queries. Our particular emphasis is on extensible query processing architectures and techniques. The other chapters in this book on query languages and optimization techniques complement this chapter. 1
Heuristic and Randomized Optimization for the Join Ordering Problem
- VLDB Journal
, 1997
"... Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to leftdeep processing trees is only advisable for certain join graph types. Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for optimizing join expressions. They generate solutions of...
CLARANS: A Method for Clustering Objects for Spatial Data Mining
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures t ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Abstract—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, we investigate how CLARANS can handle not only points objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, we develop two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms. Index Terms—Spatial data mining, clustering algorithms, randomized search, computational geometry. æ 1
Cost-based Query Scrambling for Initial Delays
, 1998
"... Remote data access from disparate sources across a wide-area network such as the Internet is problematic due to the unpredictable nature of the communications medium and the lack of knowledge about the load and potential delays at remote sites. Traditional, static, query processing approaches break ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
Remote data access from disparate sources across a wide-area network such as the Internet is problematic due to the unpredictable nature of the communications medium and the lack of knowledge about the load and potential delays at remote sites. Traditional, static, query processing approaches break down in this environment because they are unable to adapt in response to unexpected delays. Query scrambling has been proposed to address this problem. Scrambling modifies query execution plans on-the-fly when delays are encountered during runtime. In its original formulation, scrambling was based on simple heuristics, which although providing good performance in many cases, were also shown to be susceptible to problems resulting from bad scrambling decisions. In this paper we address these shortcomings by investigating ways to exploit query optimization technology to aid in making intelligent scrambling choices. We propose three different approaches to using query optimization for scramblin...
Optimizing Large Join Queries in Mediation Systems
- International Conference on Database Theory (ICDT
, 1999
"... . In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the probl ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing -- one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in mor...
Physical Data Independence, Constraints, and Optimization with Universal Plans
, 1999
"... We present an optimization method and algorithm designed for three objectives: physical data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependencies). By using dictionaries (finite funct ..."
Abstract
-
Cited by 36 (10 self)
- Add to MetaCart
We present an optimization method and algorithm designed for three objectives: physical data independence, semantic optimization, and generalized tableau minimization. The method relies on generalized forms of chase and "backchase" with constraints (dependencies). By using dictionaries (finite functions) in physical schemas we can capture with constraints useful access structures such as indexes, materialized views, source capabilities, access support relations, gmaps, etc. The search space for query plans is de ned and enumerated in a novel manner: the chase phase rewrites the original query into a "universal" plan that integrates all the access structures and alternative pathways that are allowed by applicable constraints. Then, the backchase phase produces optimal plans by eliminating various combinations of redundancies, again according to constraints. This method is applicable (sound) to a large class of queries, physical access structures, and semantic constraints. We prove that it is in fact complete for "path-conjunctive" queries and views with complex objects, classes and dictionaries, going beyond previous theoretical work on processing queries using materialized views.
Iterative Dynamic Programming: A New Class of Query Optimization Algorithms
- ACM Trans. on Database Systems
, 1998
"... The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohi ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohibitive if complex queries need to be processed, new query execution techniques need to be integrated, or in certain programming environments (e.g., distributed database systems). In this paper, we present and thoroughly evaluate a new class of query optimization algorithms that are based on a principle that we call iterative dynamic programming, or IDP for short. IDP has several important advantages: First, IDP-algorithms produce the best plans of all known algorithms in situations in which dynamic programming is not viable because of its high complexity. Second, some IDP variants are adaptive and produce as good plans as dynamic programming if dynamic programming is viable an...

