Results 1 - 10
of
29
Optimization of Sequence Queries in Database Systems
, 2001
"... The need to search for complex and recurring patterns in database sequences is shared by many applications. In this work, we discuss how to express and support efficiently sophisticated sequential pattern queries in relational database systems. Thus, we first ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The need to search for complex and recurring patterns in database sequences is shared by many applications. In this work, we discuss how to express and support efficiently sophisticated sequential pattern queries in relational database systems. Thus, we first
The MD-join: An Operator for Complex OLAP
- In Proc. ICDE
, 2001
"... OLAP queries (i.e. group-by or cube-by queries with aggregation) have proven to be valuable for data analysis and exploration. Many decision support applications need very complex OLAP queries, requiring a fine degree of control over both the group definition and the aggregates that are computed. Fo ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
OLAP queries (i.e. group-by or cube-by queries with aggregation) have proven to be valuable for data analysis and exploration. Many decision support applications need very complex OLAP queries, requiring a fine degree of control over both the group definition and the aggregates that are computed. For example, suppose that the user has access to a data cube whose measure attribute is Sum(Sales). Then the user might wish to compute the sum of sales in New York and the sum of sales in California for those data cube entries in which Sum(Sales) ? $1,000,000.
Declarative Data Merging With Conflict Resolution
- International Conference on Information Quality (IQ 2002). 2002
, 2002
"... Database integration is a growing and increasingly important field in both research and industry. Integration requires many steps from initial schema integration and schema mapping, to data scrubbing and cleansing, and finally to data merging. While much research has concentrated on the first steps ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Database integration is a growing and increasingly important field in both research and industry. Integration requires many steps from initial schema integration and schema mapping, to data scrubbing and cleansing, and finally to data merging. While much research has concentrated on the first steps performed at schema level, there are only few publications about actual, practical merging of the data in an integrated database or in a query against multiple databases. When merging data, especially data from autonomous sources, there is a large potential for decreasing the quality of the merged data, even below the level of the original sources. The main reasons for decreased quality are data conflicts among the sources. To address this problem, we define resolution functions merging conflicting data. We present several alternatives of merging relational data sources with common queries through grouping & aggregating and through partitioning & joining. The resulting queries use resolution functions and can be used to migrate data from multiple sources to a target database, or to define an integrating view on multiple sources. We describe and analyze the advantages of the different approaches, and describe our practical solution in the framework of a schema mapping and data transformation tool.
Expressing and Optimizing Sequence Queries in Database Systems
- ACM Trans. on Database Systems
, 2004
"... this paper, we investigate the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems. Thus, we first introduce SQL-TS, an extension of SQL to express these patterns, and then we study how to optimiz ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
this paper, we investigate the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems. Thus, we first introduce SQL-TS, an extension of SQL to express these patterns, and then we study how to optimize the queries for this language. We take the optimal text search algorithm of Knuth, Morris and Pratt, and generalize it to handle complex queries on sequences. Our algorithm exploits the interdependencies between the elements of a pattern to minimize repeated passes over the same data. Experimental results on typical sequence queries, such as double bottom queries, confirm that substantial speedups are achieved by our new optimization techniques
SQL Database Primitives for Decision Tree Classifiers
, 2001
"... Scalable data mining in large databases is one of today's challenges to database technologies. Thus, substantial effort is dedicated to a tight coupling of database and data mining systems leading to database primitives supporting data mining tasks. In order to support a wide range of tasks and to b ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Scalable data mining in large databases is one of today's challenges to database technologies. Thus, substantial effort is dedicated to a tight coupling of database and data mining systems leading to database primitives supporting data mining tasks. In order to support a wide range of tasks and to be of general usage these primitives should be rather building blocks than implementations of specific algorithms. In this paper, we describe primitives for building and applying decision tree classifiers. Based on the analysis of available algorithms and previous work in this area we have identified operations which are useful for a number of classification algorithms. We discuss the implementation of these primitives on top of a commercial DBMS and present experimental results demonstrating the performance benefit.
Efficient similarity-based operations for data integration
- Data Knowl. Eng
"... Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join s ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of this operator, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples of application from the context of a data reconciliation project for looted art.
A native extension of sql for mining data streams
- In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 2005
"... ESL 1 enables users to develop stream applications in an SQL-like ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
ESL 1 enables users to develop stream applications in an SQL-like
ATLaS: a Small but Complete SQL Extension for Data Mining and Data Streams
, 2003
"... Introduction DBMSs have long suffered from SQL's lack of power and extensibility. We have implemented ATLaS [1], a powerful database language and system that enables users to develop complete data-intensive applications in SQL---by writing new aggregates and table functions in SQL, rather than in p ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Introduction DBMSs have long suffered from SQL's lack of power and extensibility. We have implemented ATLaS [1], a powerful database language and system that enables users to develop complete data-intensive applications in SQL---by writing new aggregates and table functions in SQL, rather than in procedural languages as in current Object-Relational systems. As a result, ATLaS' SQL is Turing-complete [7], and is very suitable for advanced data-intensive applications, such as data mining and stream queries. The ATLaS system is now available for download along with a suite of applications [1] including various data mining functions, that have been coded in ATLaS' SQL, and execute with a modest (20--40%) performance overhead with respect to the same applications written in C/C++. Our proposed demo will illustrate the key features and applications of ATLaS. In particular, we will demonstrate: . ATLaS' SQL features, including its native support for user-defined aggregates and table functi
ATLaS: A native extension of sql for data mining
- In SIAM International Conference on Data Mining (SDM
"... A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need for more general mechanisms for extending DBMSs to support efficiently database-centric data mining ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need for more general mechanisms for extending DBMSs to support efficiently database-centric data mining appliacations. To satisfy this need, we propose a new extensibility mechanism for SQL-compliant DBMSs, and demonstrate its power in supporting decision support applications. The key extension is the ability of defining new table functions and aggregate functions in SQL— rather than in external procedural languages as Object-Relational (O-R) DBMSs currently do. This simple extension turns SQL into a powerful language for decision-support applications, including ROLAPs, time-series queries, stream-oriented processing, and data mining functions. First, we discuss the use of ATLaS for data mining applications, and then the architecture and techniques used in its realization. 1
Conflict Handling Strategies in an Integrated Information System
, 2006
"... Integrated information systems provide users and applications with a unified view of heterogeneous data sources. To provide a single consistent result for every object represented in these data sources, data fusion is concerned with resolving data inconsistencies within and among the sources. We pre ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Integrated information systems provide users and applications with a unified view of heterogeneous data sources. To provide a single consistent result for every object represented in these data sources, data fusion is concerned with resolving data inconsistencies within and among the sources. We present a classification of conflict resolution strategies and show how these are implemented within an integrated information system, the Humboldt-Merger.

