Results 1 - 10
of
10
On the Performance of Object Clustering Techniques
"... We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the Tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computati ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the Tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a...
Enhancing Performance in a Persistent Object Store: Clustering Strategies in O_2
, 1995
"... We address the problem of clustering complex data on disk to minimize the number of I/O operations in data intensive applications. We first focus on the problems related to the design and implementation of clustering strategies. We then propose a set of clustering strategies as well as an algorithm ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
We address the problem of clustering complex data on disk to minimize the number of I/O operations in data intensive applications. We first focus on the problems related to the design and implementation of clustering strategies. We then propose a set of clustering strategies as well as an algorithm which implements them for the O 2 system. 1 Introduction New developments, both in the database field and in the programming languages field, have led to the design of new database management systems [Ba88], [Ki89], [Deux90]. These systems have the following characteristics: a complex object model [LR89a], a persistent programming language [AB87], and an object management system [VBD89]. Object management systems have to fulfill the following requirements: (i) efficient management of large amount of (large) objects; (ii) object sharing and versioning; (iii) and usual database functionality such as transaction management, concurrency control and recovery. In this paper, we are intereste...
Query Processing in Tertiary Memory Databases
- IN PROC. OF THE 21ST INT. CONF. ON VERY LARGE DATA BASES
, 1996
"... ..."
Client Cache Management in a Distributed Object Database
, 1995
"... A distributed object database stores objects persistently at servers. Applications run on client machines, fetching objects into a client-side cache of objects. If fetching and cache management are done in terms of objects, rather than fixed-size units such as pages, three problems must be solved: 1 ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
A distributed object database stores objects persistently at servers. Applications run on client machines, fetching objects into a client-side cache of objects. If fetching and cache management are done in terms of objects, rather than fixed-size units such as pages, three problems must be solved: 1. which objects to prefetch, 2. how to translate, or swizzle, inter-object references when they are fetched from server to client, and 3. which objects to displace from the cache. This thesis reports the results of experiments to test various solutions to these problems. The experiments use the runtime system of the Thor distributed object database and benchmarks adapted from the Wisconsin OO7 benchmark suite. The thesis establishes the following points: 1. For plausible workloads involving some amount of object fetching, the prefetching policy is likely to have more impact on performance than swizzling policy or cache management policy. 2. A simple breadth-first prefetcher can have performa...
A parallel algorithm for record clustering
- ACM Trans. on Database Systems
, 1990
"... We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an environment with relatively little overlap among the queries.
Research Issues in Automatic Database Clustering
- SIGMOD RECORD
"... While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proltferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic database clustering technique that will dynamically re-cluster a database with little intervention of a database administrator (DBA) and maintain an acceptable quevy response time at all times. In this paper we describe the issues that need to be solved when developing such a technique. 1.
A Distributed Clustering Algorithm for Web-Based Access Patterns
- In Workshop on distributed and
, 2000
"... We introduce a distributed document clustering algorithm based on user access patterns for multi-server Web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response tim ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We introduce a distributed document clustering algorithm based on user access patterns for multi-server Web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response time that is observed by Web users. The algorithm first distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of Web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.
A Comparison of Group-based and Object-based Data Clustering Techniques
- Eighth International Database Workshop, Data Mining, Data Warehousing and Client/Server Databases., Hong Kong, Springer-Verlag Singapore
, 1997
"... We investigate the behavior and stability of two data clustering methods. The Slink method is based on a single object comparison test, whereas the Ward method is based on comparing groups of objects. Different sensitivity parameters are used to test the behavior and stability of the two technique ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We investigate the behavior and stability of two data clustering methods. The Slink method is based on a single object comparison test, whereas the Ward method is based on comparing groups of objects. Different sensitivity parameters are used to test the behavior and stability of the two techniques when clustering objects are drawn from the 3-D space. The objects are generated according to three commonly used statistical distributions. This study tends to confirm the findings of earlier (less exhaustive) studies, namely, the high similarity in behavior and stability among clustering methods. 1 Introduction Clustering applications range from databases, e.g.. data clustering and data mining, to machine learning, to data compression [JMM96]. In the case of database clustering, the ability to categorize objects into groups allows the re-allocation of related data in a database in order to improve the performance of DBMSs. Data records which are frequently referenced together are moved i...
A Method for the Horizontal Partition of Object-Oriented Databases
"... The partitioning of related objects should be performed before clustering for an efficient access in object-oriented databases. In this paper, a partition of related objects in object-oriented databases is presented. All subclass nodes in a class inheritance hierarchy of a schema graph are condensed ..."
Abstract
- Add to MetaCart
The partitioning of related objects should be performed before clustering for an efficient access in object-oriented databases. In this paper, a partition of related objects in object-oriented databases is presented. All subclass nodes in a class inheritance hierarchy of a schema graph are condensed to a class node in the graph because the aggregation hierarchy has more influence on the partition than the class inheritance hierarchy. This reduced graph is called a condensed schema graph. A set function and an accessibility function are defined to characterize a maximal subset of related objects among the set of objects in a class. A set function maps a subset of the domain class objects to a subset of the range class objects. An accessibility function maps a subset of the objects of a class into a subset of the objects of the same class through a composition of set functions. A partition algorithm is derived to find the related objects of a condensed schema graph using accessibility fu...
An Analysis Of File Space Properties Using Clustering
, 2002
"... Clustering is a technique used in information retrieval to improve both search quality and response time. This latter is achieved through the placing of similar documents/records close together in the file space which in turn has the effect of reducing the total number of block transfers needed to r ..."
Abstract
- Add to MetaCart
Clustering is a technique used in information retrieval to improve both search quality and response time. This latter is achieved through the placing of similar documents/records close together in the file space which in turn has the effect of reducing the total number of block transfers needed to retrieve a given record set. A file space which has been restructured in this way may be regarded as having a high space density relative to one in which the records are randomly distributed, space density being defined as a file property relating to the measure of record similarity or closeness.

