Results 11  20
of
39
A Uniform Approach for Selecting Views and Indexes in a Data Warehouse
 In Proceedings of the 1997 International Database Engineering and Applications Symposium
, 1997
"... Careful selection of aggregate views and some of their most used indexes to materialize in a data warehouse reduces the warehouse query response time as well as warehouse maintenance cost under some storage space constraint. Data Warehouses collect and store large amounts of integrated enterprise da ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Careful selection of aggregate views and some of their most used indexes to materialize in a data warehouse reduces the warehouse query response time as well as warehouse maintenance cost under some storage space constraint. Data Warehouses collect and store large amounts of integrated enterprise data from a number of independent data sources over a long period of time. Warehouse data are used for online analytical processing to assist management in making quick and competitive business decisions. Precomputing and storing summary tables (materialized views) reduces the amount of time needed to recompute these views across several source tables in order to answer complex warehouse queries. A data cube is an elegant way for representing aggregate information in a Warehouse and is an ndimensional view with 2 n subviews. This paper presents a uniform technique for selecting the subviews of the data cube and their indexes to materialize in order to produce the best resultant benefit to t...
Measuring the Performance of Database Object Horizontal Fragmentation Schemes
 In Proceedings of the 3rd IEEE international database engineering and Applications Symposium (IDEAS99
, 1999
"... A horizontal fragment of a database class in an objectoriented database system contains subsets of its instance objects (or class extents) reflecting the way applications are accessing database objects. Allocating welldefined fragments of classes to distributed sites has the advantage of minimizin ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A horizontal fragment of a database class in an objectoriented database system contains subsets of its instance objects (or class extents) reflecting the way applications are accessing database objects. Allocating welldefined fragments of classes to distributed sites has the advantage of minimizing transmission costs of data to remote sites as well as minimizing retrieval time of data needed locally. All algorithms so far proposed in the literature for defining horizontal fragments of database objects are based on information from earlier static requirements analysis. Thus, a refragmentation of the system is needed when application access and schema information have undergone sufficient changes. In this paper, we provide a technique for measuring the performance of object horizontal fragments placed at distributed sites. This work provides a platform for dynamic object horizontal fragmentation and for comparing object horizontal fragmentation schemes. Keywords: Objectoriented data...
A Customizable Hybrid Approach to Data Clustering
 Proc. of the 2003 ACM Symposium on Applied Computing
, 2003
"... Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users ’ specifications. Preliminary experiments show encouraging results. 1.
Research Issues in Automatic Database Clustering
 SIGMOD RECORD
"... While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic database clustering technique that will dynamically recluster a database with little intervention of a database administrator (DBA) and maintain an acceptable quevy response time at all times. In this paper we describe the issues that need to be solved when developing such a technique. 1.
Text mining biomedical literature for discovering genetogene relationships: a comparative study of algorithms
 IEEE/ACM Transaction on Computational Biology and Bioinformatics, Vol.2, No.1, JanMarch
, 2005
"... Abstract—Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, s ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract—Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEAPARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEAPARTITION and hierarchical clustering algorithm outperformed kmeans clustering and selforganizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEAPARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEAPARTITION had higher purity, lower entropy, and higher mutual information than those produced by kmeans and selforganizing map. Whereas BEAPARTITION and the hierarchical clustering produced similar quality of clusters, BEAPARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEAPARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations. Index Terms—Bond energy algorithm, microarray, MEDLINE, text analysis, cluster analysis, gene function. 1
Accommodating Dimension Hierarchies in a Data Warehouse View/Index Selection Scheme
 Systems Development Methods for the Next Century
, 1997
"... Storing vast number of aggregate tables (materialized views) of the base data collected from its various independent data sources is one way warehousing systems provide fast access to data requested by complex warehouse queries. A data warehouse collects, stores and integrates large amounts of da ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Storing vast number of aggregate tables (materialized views) of the base data collected from its various independent data sources is one way warehousing systems provide fast access to data requested by complex warehouse queries. A data warehouse collects, stores and integrates large amounts of data from various function oriented databases over a long period of time which is used for online analytical processing (OLAP). In addition to storing views which project mostly on primary key attributes (e.g., customerid), materializing some of their indexes help reduce query response time at the expense of increasing maintenance cost for stored tables and diminishing storage space. Thus, in order to achieve near optimal query response time, maintenance cost and storage space utilization, schemes that enable careful selection of views and indexes are required. For an even better system performance, extending these selection schemes to accommodate views that are grouped on dimension at...
Manufacturing Cell Formation by StateSpace Search
"... This paper addresses the problem of grouping machines in order to design cellular ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper addresses the problem of grouping machines in order to design cellular
Approximation algorithms for the minimum bends traveling salesman problem
 In Proceedings of the 8th Conference on Integer Programming and Combinatorial Optimization
, 2000
"... The problem of traversing a set of points in the order that minimizes the total distance traveled (traveling salesman problem) is one of the most famous and wellstudied problems in combinatorial optimization. It has many applications, and has been a testbed for many of the must useful ideas in algo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The problem of traversing a set of points in the order that minimizes the total distance traveled (traveling salesman problem) is one of the most famous and wellstudied problems in combinatorial optimization. It has many applications, and has been a testbed for many of the must useful ideas in algorithm design and analysis. The usual metric, minimizing the total distance traveled, is an important one, but many other metrics are of interest. In this paper, we introduce the metric of minimizing the number of turns in the tour, given that the input points are in the Euclidean plane. To our knowledge this metric has not been studied previously. It is motivated by applications in robotics and in the movement of other heavy machinery: for many such devices turning is an expensive operation. We give approximation algorithms for several variants of the traveling salesman problem for which the metric is to minimize the number of turns. We call this the minimum bend traveling salesman problem. For the case of an arbitrary set of n points in the Euclidean plane, we give an O(lg z)approximation algorithm, where z is the maximum number of collinear points. In the worst case z can be as big as n, but z will often be much smaller. For the case when the lines are restricted to being either horizontal or vertical, we give a 2approximation algorithm. If we have the further restriction that no two points are allowed to have the same x or ycoordinate, we give an algorithm that finds a tour which makes at most two turns more than the optimal tour. Thus we have an approximation algorithm with an additive, rather than a multiplicative error bound. Beyond the additive error bound, our algorithm for this problem introduces several interesting algorithmic techniques for decomposing sets of points in the Euclidean plane that we believe to be of independent interest.
A TwoPhase Approach to Data Allocation in Distributed Databases
, 1995
"... In this paper, we propose a twophase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the se ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we propose a twophase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the second phase, we use a "divide and conquer " search technique to allocate clusters to the computing nodes (sites) in the network. We show, via complexity analysis, that the combined process of clustering and data allocation takes time that is polynomial with respect to the number of objects and sites. We also show, via experimental analysis, that our approach produces solutions that are close to optimal for a wide range of fragmentations, queries and network structures. 1 Introduction Data allocation is a critical aspect of distributed database systems: a poorlydesigned data allocation can lead to inefficient computation, high access costs, and high network loads [15, 16] whereas a welldesig...
Automatic Database Clustering Using Data Mining, DEXA '06
 Proceedings of the 17th International Conference on Database and Expert Systems Applications
, 2006
"... Because of data proliferation, efficient access methods and data storage techniques have become increasingly critical to maintain an acceptable query response time. One way to improve query response time is to reduce the number of disk I/Os by partitioning the database vertically (attribute clusteri ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Because of data proliferation, efficient access methods and data storage techniques have become increasingly critical to maintain an acceptable query response time. One way to improve query response time is to reduce the number of disk I/Os by partitioning the database vertically (attribute clustering) and/or horizontally (record clustering). A clustering is optimized for a given set of queries. However in dynamic systems the queries change with time, the clustering in place becomes obsolete, and the database needs to be reclustered dynamically. In this paper we discuss an efficient algorithm 1 for attribute clustering that dynamically and automatically generate attribute clusters based on closed item sets mined from the attributes sets found in the queries running against the database. 1.