Results 1 - 10
of
29
Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals
, 1996
"... Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these op ..."
Abstract
-
Cited by 630 (6 self)
- Add to MetaCart
Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, crosstabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications
- In SIGMOD
, 1998
"... Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algor ..."
Abstract
-
Cited by 101 (5 self)
- Add to MetaCart
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR app...
A Linear Method for Deviation Detection in Large Databases
, 1996
"... We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of t ..."
Abstract
-
Cited by 74 (1 self)
- Add to MetaCart
We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of the data. We give a formal description of the problem and present a linear algorithm for detecting deviations. Our solution simulates a mechanism familiar to human beings: after seeing a series of similar data, an element disturbing the series is considered an exception. We also present experimental results from the application of this algorithm on real-life datasets showing its effectiveness. Index Terms: Data Mining, Knowledge Discovery, Deviation, Exception, Error Introduction The importance of detecting deviations (or exceptions) in data has been recognized in the fields of Databases and Machine Learning for a long time. Deviations have been often viewed as outliers, or errors, or nois...
Using SQL to Build New Aggregates and Extenders for Object-Relational Systems
- In Proceedings of 26th International Conference on Very Large Data Bases
, 2000
"... User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database prog ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database programmers because it preserves the constructs, programming paradigm and data types of SQL (whereas there is an `impedance mismatch' between SQL and the procedural languages of user-defined functions currently used in O-R DBs). AXL will also inherit the benefits of database query languages, such as scalability, data independence and parallelizability. In this paper, we show that, while adding only minimal extensions to SQL, AXL is very powerful and capable of expressing complex algorithms e#ciently. We demonstrate this by coding data mining functions and other advanced applications that, previously, had been a major problem for SQL databases. Due to its flexibility, SQL-compati...
Mining Generalized Association Rules and Sequential Patterns Using SQL Queries
- Prof. of 4th Intl. Conf. on Knowledge Discovery and Data Mining (KDD98
, 1998
"... Database integration of mining is becoming increasingly important with the installation of larger and larger data warehouses built around relational database technology. Most of the commercially available mining systems integrate loosely (typically, through an ODBC or SQL cursor interface) with ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Database integration of mining is becoming increasingly important with the installation of larger and larger data warehouses built around relational database technology. Most of the commercially available mining systems integrate loosely (typically, through an ODBC or SQL cursor interface) with data stored in DBMSs. In cases where the mining algorithm makes multiple passes over the data, it is also possible to cache the data in at les rather than retrieve multiple times from the DBMS, to achieve better performance. Recent studies have found that for association rule mining, with carefully tuned SQL formulations it is possible to achieve performance comparable to systems that cache the data in les outside the DBMS. The SQL implementation has potential for oering other qualitative advantages like automatic parallelization, development ease, portability and inter-operability with relational operators. In this paper, we present several alternatives for formulating as ...
Storing and Querying XML Data in Object-Relational DBMSs
, 2002
"... As the popularity of eXtensible Markup Language (XML) continues to increase at an astonishing pace, data management systems for storing and querying large repositories of XML data are urgently needed. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
As the popularity of eXtensible Markup Language (XML) continues to increase at an astonishing pace, data management systems for storing and querying large repositories of XML data are urgently needed.
Extending the ODMG Object Model with Composite Objects
- In OOPSLA
, 1998
"... In this paper we extend the ODMG object data model with composite objects. A composite object is an object built by aggregating other component objects. Exclusiveness and dependency constraints, as well as referential integrity, can be associated with composition relationships among objects. Our com ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper we extend the ODMG object data model with composite objects. A composite object is an object built by aggregating other component objects. Exclusiveness and dependency constraints, as well as referential integrity, can be associated with composition relationships among objects. Our composite object model is developed in the framework of the ODMG object database standard data model, but can be used in both object-oriented and object-relational database systems. In the paper, we propose a language for defining composite objects and we define the semantics of update operations on composite objects. Keywords Object-oriented database systems, composite objects, integrity constraints, data models. INTRODUCTION Object-oriented DBMS (OODBMS) and object-relational DBMS (ORDBMS) are establishing themselves as the new generation DBMS. Object database systems overcome the limitations of relational systems with respect to several emerging data-intensive applications because of thei...
Concept Hierarchy in Data Mining: Specification, Generation and Implementation
, 1997
"... Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of concept hierarchy such as the automatic generation and encoding technique in the context of data mining. After the discussion on the basic terminology and categorization, automatic generation of concept hierarchies is studied for both nominal and numerical hierarchies. One algorithm is designed for determining the partial order on a given set of nominal attributes. The resulting partial order is a useful guide for users to finalize the concept hierarchy for their particular data mining tasks. Based on hierarchical and partitioning clustering methods, two algorithms are proposed for the automatic generation of numerical hierarchies. The quality and performance comparisons indicates that the ...
Scalable Mining for Classification Rules in Relational Databases
- in Proceedings of the International Database Engineering & Application Symposium
, 1998
"... Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The state-of-art classifiers [21, 25] need an in-memory data structure of size O(N), where N is the size of t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The state-of-art classifiers [21, 25] need an in-memory data structure of size O(N), where N is the size of the training data, to achieve efficiency. For large data sets, such a data structure will not fit in the internal memory. The best previously known classifier does a quadratic number of I/Os for large N . In this paper, we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the extended relational calculus SQL, and this in turn allows the classifier to be built into a relational database system directly. MIND is truly scalable with respect to I/O efficiency, which is important since scalability is a key requirement for any data mining algorithm. We built a prototype of MIND in the...
User Defined Aggregates in Object-Relational Systems
- In Proceedings of the 16th International Conference on Data Engineering
, 2000
"... User-defined aggregates are essential in many advanced database applications, particularly in expressing data mining functions, but they find little support in current systems including Object-Relational databases. Three serious limitations of current systems are (i) the inability of introducing new ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
User-defined aggregates are essential in many advanced database applications, particularly in expressing data mining functions, but they find little support in current systems including Object-Relational databases. Three serious limitations of current systems are (i) the inability of introducing new aggregates (e.g., by coding them in procedural language as originally proposed in SQL3), (ii) the inability of returning partial results during the computation (e.g., to support online aggregation), and (iii) the inability of using aggregates in recursive queries (e.g., to express Bill of Materials and optimized graph searches). In this paper, we presents a unified solution to these problems which realizes SQL3 original proposal for user-defined aggregates (UDAs) , and adds significant improvements in terms of expressive power and ease of use: in fact our SQL-AG system also supports online aggregation, monotonic aggregation, and a high-level aggregate definition language named SADL. We focu...

