Results 1 - 10
of
16
Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data
, 2000
"... Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer ..."
Abstract
-
Cited by 80 (8 self)
- Add to MetaCart
Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication,inwhich cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer may be unboundedly imprecise. Alternatively, queries over remote master data return a precise answer, but with potentially poor performance. To bridge the gap between these two extremes, we propose a new class of replication systems called TRAPP (Tradeoff in Replication Precision and Performance). TRAPP systems give each user fine-grained control over the tradeoff between precision and performance: Caches store ranges that are guaranteed to bound the current data values, instead of storing stale exact values. Users supply a quantitative precision constraint along with each query. To answer a query, TRAPP systems automatically select a combination of locally cached bounds and exact master data stored remotely to deliver a bounded answer consisting of a range that is no wider than the specified precision constraint, that is guaranteed to contain the precise answer, and that is computed as quickly as possible. This paper defines the architecture of TRAPP replication systems and covers some mechanics of caching data ranges. It then focuses on queries with aggregation, presenting optimization algorithms for answering queries with precision constraints, and reporting on performance experiments that demonstrate the fine-grained control of the precision-performance tradeoff offered by TRAPP systems.
Fast Approximate Answers to Aggregate Queries on a Data Cube
- In Proc. of the 1999 Intl. Conf. on Scientific and Statistical Database Management
, 1999
"... Modern decision support systems require very quick (interactive) responses from the DBMS, but pose complex queries on large volumes of data. In this paper, we present a novel solution to this problem: we precompute concise histogram statistics on the data to answer the queries quickly, but approxima ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Modern decision support systems require very quick (interactive) responses from the DBMS, but pose complex queries on large volumes of data. In this paper, we present a novel solution to this problem: we precompute concise histogram statistics on the data to answer the queries quickly, but approximately. Our hypothesis is that many decision support applications can tolerate small errors in query results in return for large reductions in response times. In particular, we propose the use of multiple histograms to approximate the data cube and answer aggregate queries approximately using this summarized data. We enhance histograms to estimate the quality of the approximate answers. We primarily explore the interaction among various histograms on the data cube in order to minimize the space needed when an upper bound on the errors is given. Our main contribution in this paper is an efficient technique for selecting a provably near-optimal set of histograms on the data cube. Extensive ex...
Discovery of General Knowledge in Large Spatial Databases
, 1993
"... Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate knowledge discovery in spatial databases and develop a generalization-based knowledge discovery mechanism ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate knowledge discovery in spatial databases and develop a generalization-based knowledge discovery mechanism which integrates attribute-oriented induction on nonspatial data and spatial merge and generalization on spatial data. The study shows that knowledge discovery has wide applications in spatial databases, and relatively efficient algorithms can be developed for discovery of general knowledge in large spatial databases.
Resource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment
- In Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining
, 1995
"... With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In this article, ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In this article, a multiple layered database (MLDB) approach is proposed to handle the resource and knowledge discovery in global information base. A preliminary experiment using on-line technical reports, a representative subset of the Internet, shows the advantages of such an approach. A multiple layered database is a database formed by generalization and transformation of the information, layer-by-layer, starting from the original information base (treated as layer-0, the primitive layer). Information retrieval, data mining, and data analysis techniques can be used to extract and transform information from a lower layer database to a higher one. Layer-1 and higher layers of an MLDB can be modeled by an e...
Knowledge Discovery in Object-Oriented and Active Databases
, 1994
"... Knowledge discovery in databases (or data mining) , which extracts interesting knowledge from large databases, represents an important direction in the development of data- and knowledge- base systems. With fruitful research results on knowledge discovery in relational databases and the emerging tre ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Knowledge discovery in databases (or data mining) , which extracts interesting knowledge from large databases, represents an important direction in the development of data- and knowledge- base systems. With fruitful research results on knowledge discovery in relational databases and the emerging trend in the development of object-oriented and active database systems, it is natural to investigate knowledge discovery in object-oriented and active databases. This paper overviews the mechanisms for knowledge discovery in object-oriented and active database systems, with an emphasis on the techniques for generalization of complex data objects, methods, class hierarchies and dynamically evolving data, and on the integration of knowledge discovery mechanisms with production control processes. 1 Introduction With rapid growth in the amount of information stored in databases, the development of effective and efficient tools for knowledge discovery in databases (KDD, or data/knowledge mining) ...
Resource and Knowledge Discovery in Global Information Systems: A Scalable Multiple Layered Database Approach
- IN PROC. OF THE FIRST INT'L CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 1995
"... With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In
Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model
- Data and Knowledge Engineering
, 1998
"... Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases because mining knowledge from such databases may improve understanding, organization, and utilization of the data stored there.
Informix under CONTROL: Online query processing
- Data Mining and Knowledge Discovery Journal
, 2000
"... Abstract. The goal of the CONTROL project at Berkeley is to develop systems for interactive analysis of large data sets. We focus on systems that provide users with iteratively refining answers to requests and online control of processing, thereby tightening the loop in the data analysis process. Th ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract. The goal of the CONTROL project at Berkeley is to develop systems for interactive analysis of large data sets. We focus on systems that provide users with iteratively refining answers to requests and online control of processing, thereby tightening the loop in the data analysis process. This paper presents the database-centric subproject of CONTROL: a complete online query processing facility, implemented in a commercial Object-Relational DBMS from Informix. We describe the algorithms at the core of the system, and detail the end-to-end issues required to bring the algorithms together and deliver a complete system.
Systematic Multiresolution and its Application to the World Wide Web
- In Proc. 15th International Conference on Data Engineering
, 1998
"... Many emerging environments are increasingly facing the problem where the requirements of applications easily outstrip the system resources. This is particularly acute in the World Wide Web (WWW) and many data-intensive applications like OLAP and multimedia databases. In this paper, we address this p ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Many emerging environments are increasingly facing the problem where the requirements of applications easily outstrip the system resources. This is particularly acute in the World Wide Web (WWW) and many data-intensive applications like OLAP and multimedia databases. In this paper, we address this problem in the Web context via systematic multiresolution, i.e., a framework for providing responses at different qualities (resolutions) and costs. We validate our conceptual contributions by implementing NetBlitz, a multiresolution-based proxy server on the WWW. NetBlitz addresses two key problems facing the Web: high latencies and heterogeneity of client resources and requirements. It solves these problems by dynamically generating the "required version" of a web object based on client preferences and capabilities. We also propose novel multiresolution-aware caching techniques that further improve performance. Finally, we experimentally demonstrate the utility of multiresolution and the c...
Discovery Of Multiple-Level Rules From Large Databases
, 1996
"... With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many or ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many organizations. Previous studies on data mining have been focused on the discovery of knowledge at a single conceptual level, either at the primitive level or at a rather high conceptual level. However, it is often desirable to discover knowledge at multiple conceptual levels, which will provide a spectrum of understanding, from general to specific, for the underlying data. In this thesis, we first introduce the conceptual hierarchy, a hierarchical organization of the data in the databases. Two algorithms for dynamic adjustment of conceptual hierarchies are developed, as well as another algorithm for automatic generation of conceptual hierarchies for numerical attributes. In addition, a set of ...

