• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An Overview of Query Optimization in Relational Systems (1998)

by Surajit Chaudhuri
Venue:In PODS
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 71
Next 10 →

A multidimensional workload-aware histogram

by Nicolas Bruno, Surajit Chaudhuri, Luis Gravano - In SIGMOD , 2001
"... ..."
Abstract - Cited by 97 (9 self) - Add to MetaCart
Abstract not found

Optimizing Recursive Information Gathering Plans

by Eric Lambrecht, Subbarao Kambhampati , 1999
"... In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discu ..."
Abstract - Cited by 50 (10 self) - Add to MetaCart
In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of...

Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation

by Arnd Christian König, Gerhard Weikum - VLDB CONFERENCE , 1999
"... This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with para ..."
Abstract - Cited by 30 (1 self) - Add to MetaCart
This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with parametric curve fitting, leading to a specific class of linear splines. The approach reconciles the benefits of histograms, simplicity and versatility, with those of parametric techniques especially the adaptivity to statistically biased and dynamically evolving query workloads. The paper

Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations

by Yuan Yu, Pradeep Kumar Gunda, Michael Isard
"... Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Suc ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require nonstandard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest. This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between userdefined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface has a material effect on the performance of computations; some execution plans perform better than others on average; and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the MapReduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.

Graphs-at-a-time: Query Language and Access Methods for Graph Databases

by Huahai He, Ambuj K. Singh , 2008
"... With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this l ..."
Abstract - Cited by 15 (0 self) - Add to MetaCart
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.

Combining Fragmentation and Encryption to Protect Privacy in Data Storage

by Valentina Ciriani, Sabrina De Capitani di Vimercati, Sara Foresti, SUSHIL JAJODIA, Stefano Paraboschi, PIERANGELA SAMARATI
"... The impact of privacy requirements in the development of modern applications is increasing very quickly. Many commercial and legal regulations are driving the need to develop reliable solutions for protecting sensitive information whenever it is stored, processed, or communicated to external parties ..."
Abstract - Cited by 12 (10 self) - Add to MetaCart
The impact of privacy requirements in the development of modern applications is increasing very quickly. Many commercial and legal regulations are driving the need to develop reliable solutions for protecting sensitive information whenever it is stored, processed, or communicated to external parties. To this purpose, encryption techniques are currently used in many scenarios where data protection is required since they provide a layer of protection against the disclosure of personal information, which safeguards companies from the costs that may arise from exposing their data to privacy breaches. However, dealing with encrypted data may make query processing more expensive. In this paper, we address these issues by proposing a solution to enforce privacy of data collections that combines data fragmentation with encryption. We model privacy requirements as confidentiality constraints expressing the sensitivity of attributes and their associations. We then use encryption as an underlying (conveniently available) measure for making data unintelligible, while exploiting fragmentation as a way to break sensitive associations among attributes. We formalize the problem of minimizing the impact of fragmentation in terms of number of fragments and their affinity and present two heuristic algorithms for solving such problems. We also discuss

An Evolutionary Approach to Materialized Views Selection in a Data Warehouse Environment

by Chuan Zhang, Xin Yao, Senior Member, Jian Yang - IEEE Trans. Syst., Man, Cybern , 2001
"... A data warehouse (DW) contains multiple views accessed by queries. One of the most important decisions in designing a DW is selecting views to materialize for the purpose of efficiently supporting decision making. The search space for possible materialized views is exponentially large. Therefore heu ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
A data warehouse (DW) contains multiple views accessed by queries. One of the most important decisions in designing a DW is selecting views to materialize for the purpose of efficiently supporting decision making. The search space for possible materialized views is exponentially large. Therefore heuristics have been used to search for a near optimal solution. In this paper, we explore the use of an evolutionary algorithm for materialized view selection based on multiple global processing plans for queries. We apply a hybrid evolutionary algorithm to solve three related problems. The first is to optimize queries. The second is to choose the best global processing plan from multiple global processing plans. The third is to select materialized views from a given global processing plan. Our experiment shows that the hybrid evolutionary algorithm delivers better performance than either the evolutionary algorithm or heuristics used alone in terms of the minimal query and maintenance cost and the evaluation cost to obtain the minimal cost.

Scheduling multiple data visualization query workloads on a shared memory machine

by Henrique Andrade, Tahsin Kurc, Alan Sussman, Joel Saltz - In Proceedings of the 2002 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2002), Fort Lauderdale, FL , 2002
"... hcma,als¤ ..."
Abstract - Cited by 11 (10 self) - Add to MetaCart
Abstract not found

Interprocedural query extraction for transparent persistence

by Ben Wiedermann, Ali Ibrahim, William R. Cook - In Proc. of ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA , 2008
"... Transparent persistence promises to integrate programming languages and databases by allowing procedural programs to access persistent data with the same ease as non-persistent data. Transparent persistence is more likely to be adopted if it leverages the performance and transaction management of re ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
Transparent persistence promises to integrate programming languages and databases by allowing procedural programs to access persistent data with the same ease as non-persistent data. Transparent persistence is more likely to be adopted if it leverages the performance and transaction management of relational databases. Since creating good relational queries from procedural programs is hard, most practical systems compromise transparency to achieve performance. In this work we demonstrate the practical feasibility of a technique for extracting relational queries from object-oriented programs. A program analysis derives query structure and conditions across methods that access persistent data. The system combines static analysis and runtime query composition to handle procedures that return persistent values. Our prototype Java compiler implements the analysis, and handles recursion and parameterized queries. We evaluate the effectiveness of the optimization on the 007 and TORPEDO benchmarks, showing that automatic optimizations are in some cases as efficient as hand-tuned code. 1.

Orchid: Integrating Schema Mapping and ETL

by Stefan Dessloch, Mauricio A. Hernández, Ryan Wisnesky, Ahmed Radwan, Jindan Zhou
"... {a.radwan,j.zhou} at umiami.edu Abstract — This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
{a.radwan,j.zhou} at umiami.edu Abstract — This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both mappings and ETL jobs are transformed into instances of this common model. As an additional benefit, instances of this common model can be optimized and deployed into multiple target environments. Orchid is being deployed in FastTrack, a data transformation toolkit in IBM Information Server. I.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University