Results 11 - 20
of
30
Measuring the Performance of Database Object Horizontal Fragmentation Schemes
- In Proceedings of the 3rd IEEE international database engineering and Applications Symposium (IDEAS99
, 1999
"... A horizontal fragment of a database class in an object-oriented database system contains subsets of its instance objects (or class extents) reflecting the way applications are accessing database objects. Allocating well-defined fragments of classes to distributed sites has the advantage of minimizin ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A horizontal fragment of a database class in an object-oriented database system contains subsets of its instance objects (or class extents) reflecting the way applications are accessing database objects. Allocating well-defined fragments of classes to distributed sites has the advantage of minimizing transmission costs of data to remote sites as well as minimizing retrieval time of data needed locally. All algorithms so far proposed in the literature for defining horizontal fragments of database objects are based on information from earlier static requirements analysis. Thus, a re-fragmentation of the system is needed when application access and schema information have undergone sufficient changes. In this paper, we provide a technique for measuring the performance of object horizontal fragments placed at distributed sites. This work provides a platform for dynamic object horizontal fragmentation and for comparing object horizontal fragmentation schemes. Keywords: Object-oriented data...
Research Issues in Automatic Database Clustering
- SIGMOD RECORD
"... While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic database clustering technique that will dynamically re-cluster a database with little intervention of a database administrator (DBA) and maintain an acceptable quevy response time at all times. In this paper we describe the issues that need to be solved when developing such a technique. 1.
The history of the cluster heat map
- The American Statistician
, 2009
"... The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (column ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling there are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications. And we trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays. 1
Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms
- IEEE/ACM Transaction on Computational Biology and Bioinformatics, Vol.2, No.1, Jan-March
, 2005
"... Abstract—Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations. Index Terms—Bond energy algorithm, microarray, MEDLINE, text analysis, cluster analysis, gene function. 1
Accommodating Dimension Hierarchies in a Data Warehouse View/Index Selection Scheme
- Systems Development Methods for the Next Century
, 1997
"... Storing vast number of aggregate tables (materialized views) of the base data collected from its various independent data sources is one way warehousing systems provide fast access to data requested by complex warehouse queries. A data warehouse collects, stores and integrates large amounts of da ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Storing vast number of aggregate tables (materialized views) of the base data collected from its various independent data sources is one way warehousing systems provide fast access to data requested by complex warehouse queries. A data warehouse collects, stores and integrates large amounts of data from various function oriented databases over a long period of time which is used for online analytical processing (OLAP). In addition to storing views which project mostly on primary key attributes (e.g., customerid), materializing some of their indexes help reduce query response time at the expense of increasing maintenance cost for stored tables and diminishing storage space. Thus, in order to achieve near optimal query response time, maintenance cost and storage space utilization, schemes that enable careful selection of views and indexes are required. For an even better system performance, extending these selection schemes to accommodate views that are grouped on dimension at...
Manufacturing Cell Formation by State-Space Search
"... This paper addresses the problem of grouping machines in order to design cellular ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper addresses the problem of grouping machines in order to design cellular
Approximation Algorithms for the Minimum Bends Traveling Salesman Problem (Extended Abstract)
- In Proceedings of the 8th Conference on Integer Programming and Combinatorial Optimization
, 2000
"... ) Cliff Stein # David P. Wagner # Dartmouth College Computer Science Technical Report TR2000-367 May 9, 2000 Abstract The problem of traversing a set of points in the order that minimizes the total distance traveled (traveling salesman problem) is one of the most famous and well-studied problems in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
) Cliff Stein # David P. Wagner # Dartmouth College Computer Science Technical Report TR2000-367 May 9, 2000 Abstract The problem of traversing a set of points in the order that minimizes the total distance traveled (traveling salesman problem) is one of the most famous and well-studied problems in combinatorial optimization. It has many applications, and has been a testbed for many of the must useful ideas in algorithm design and analysis. The usual metric, minimizing the total distance traveled, is an important one, but many other metrics are of interest. In this paper, we introduce the metric of minimizing the number of turns in the tour, given that the input points are in the Euclidean plane. To our knowledge this metric has not been studied previously. It is motivated by applications in robotics and in the movement of other heavy machinery: for many such devices turning is an expensive operation. We give approximation algorithms for several variants of the traveling...
A Two-Phase Approach to Data Allocation in Distributed Databases
, 1995
"... In this paper, we propose a two-phase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the se ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we propose a two-phase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the second phase, we use a "divide and conquer " search technique to allocate clusters to the computing nodes (sites) in the network. We show, via complexity analysis, that the combined process of clustering and data allocation takes time that is polynomial with respect to the number of objects and sites. We also show, via experimental analysis, that our approach produces solutions that are close to optimal for a wide range of fragmentations, queries and network structures. 1 Introduction Data allocation is a critical aspect of distributed database systems: a poorly-designed data allocation can lead to inefficient computation, high access costs, and high network loads [15, 16] whereas a welldesig...
A Customizable Hybrid Approach to Data Clustering
- Proc. of the 2003 ACM Symposium on Applied Computing
, 2003
"... Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users ’ specifications. Preliminary experiments show encouraging results. 1.
Automatic Database Clustering Using Data Mining, DEXA '06
- Proceedings of the 17th International Conference on Database and Expert Systems Applications
, 2006
"... Because of data proliferation, efficient access methods and data storage techniques have become increasingly critical to maintain an acceptable query response time. One way to improve query response time is to reduce the number of disk I/Os by partitioning the database vertically (attribute clusteri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Because of data proliferation, efficient access methods and data storage techniques have become increasingly critical to maintain an acceptable query response time. One way to improve query response time is to reduce the number of disk I/Os by partitioning the database vertically (attribute clustering) and/or horizontally (record clustering). A clustering is optimized for a given set of queries. However in dynamic systems the queries change with time, the clustering in place becomes obsolete, and the database needs to be re-clustered dynamically. In this paper we discuss an efficient algorithm 1 for attribute clustering that dynamically and automatically generate attribute clusters based on closed item sets mined from the attributes sets found in the queries running against the database. 1.

