Results 1 -
8 of
8
Dbfarm: A scalable cluster for multiple databases
- In Middleware
, 2006
"... Abstract. In many enterprise application integration scenarios, middleware has been instrumental in taking advantage of the flexibility and cost efficiency of clusters of computers. Web servers, application servers, platforms such as CORBA, J2EE or.NET, message brokers, and TP-Monitors, just to ment ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Abstract. In many enterprise application integration scenarios, middleware has been instrumental in taking advantage of the flexibility and cost efficiency of clusters of computers. Web servers, application servers, platforms such as CORBA, J2EE or.NET, message brokers, and TP-Monitors, just to mention a few examples, are all forms of middleware that exploit and are built for distributed deployment. The one piece in the puzzle that largely remains a centralized solution is the database. There is, of course, much work done on scaling and parallelizing databases. In fact, several products support deployment on clusters. Clustered databases, however, place the emphasis on single applications and target very large databases. By contrast, the middleware platforms just mentioned use clustered deployment not only for scalability but also for efficiently supporting multiple concurrent applications. In this paper we tackle the problem of clustered deployment of a database engine for supporting multiple applications. In the database case, multiple applications imply multiple and different database instances being used concurrently. In the paper we show how to build such a system and demonstrate its ability to support up to 300 different databases without loss of performance.
Boosting database replication scalability through partial replication and 1-copy-snapshotisolation
- In PRDC’07
"... Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained some degree of scalability, however there are two main limitations to existing approaches. Firstly, most solutions adopt a full replication model where all sites store a full copy of the database. The coordination overhead imposed by keeping all replicas consistent allows such approaches to achieve only medium scalabilitiy. Secondly, most replication protocols rely on the traditional consistency criterion, 1-copy-serializability, which limits concurrency, and thus scalability of the system. In this paper, we first analyze analytically the performance gains that can be achieved by various partial replication configurations, i.e., configurations where not all sites store all data. From there, we derive a partial replication protocol that provides 1-copy-snapshot isolation as correctness criterion. We have evaluated the protocol with TPC-W and the results show better scalability than full replication.
DTR: Distributed Transaction Routing in a Large Scale Network
- HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE
, 2008
"... Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in o ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Grid systems provide access to huge storage and computing resources at large scale. While they have been mainly dedicated to scientific computing for years, grids are now considered as a viable solution for hosting data-intensive applications. To this end, databases are replicated over the grid in order to achieve high availability and fast transaction processing thanks to parallelism. However, achieving both fast and consistent data access on such architectures is challenging at many points. In particular, centralized control is prohibited because of its vulnerability and lack of efficiency at large scale. In this article, we propose a novel solution for the distributed control of transaction routing in a large scale network. We leverage a cluster-oriented routing solution with a fully distributed approach that uses a large scale distributed directory to handle routing metadata. Moreover, we demonstrate the feasibility of our implementation through experimentation: results expose linear scale-up, and transaction routing time is fast enough to make our solution eligible for update intensive applications such as world wide online booking.
Transpeer: Adaptive Distributed Transaction Monitoring for Web2.0 applications
- In Dependable and Adaptive Distributed Systems Track of the ACM Symposium on Applied Computing (SAC DADS
, 2010
"... In emerging Web2.0 applications such as virtual worlds or social networking websites, the number of users is very important (tens of thousands), hence the amount of data to manage is huge and dependability is a crucial issue. The large scale prevents from using centralized approaches or locking/two- ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In emerging Web2.0 applications such as virtual worlds or social networking websites, the number of users is very important (tens of thousands), hence the amount of data to manage is huge and dependability is a crucial issue. The large scale prevents from using centralized approaches or locking/two-phase-commit approach. Moreover, Web2.0 applications are mostly interactive, which means that the response time must always be less than few seconds. To face these problems, we present a novel solution, TransPeer, that allows applications to scale-up without the need to buy expensive resources at a data center. To this end, databases are replicated over a P2P system in order to achieve high availability and fast transaction processing thanks to parallelism. A distributed shared dictionary, implemented on top of a DHT, contains metadata used for routing transactions efficiently. Both metadata and data are accessed in an optimistic way: there is no locking on metadata and transactions are executed on nodes in a tentative way. We demonstrate the feasibility of our approaches through experimentation.
Database Replication in Large Scale Systems: Optimizing the Number of Replicas
"... In distributed systems, replication is used for ensuring availability and increasing performances. However, the heavy workload of distributed systems such as web2.0 applications or Global Distribution Systems, limits the benefit of replication if its degree (i.e., the number of replicas) is not cont ..."
Abstract
- Add to MetaCart
In distributed systems, replication is used for ensuring availability and increasing performances. However, the heavy workload of distributed systems such as web2.0 applications or Global Distribution Systems, limits the benefit of replication if its degree (i.e., the number of replicas) is not controlled. Since every replica must perform all updates eventually, there is a point beyond which adding more replicas does not increase the throughput, because every replica is saturated by applying updates. Moreover, if the replication degree exceeds the optimal threshold, the useless replica would generate an overhead due to extra communication messages. In this paper, we propose a suitable replication management solution in order to reduce useless replicas. To this end, we define two mathematical models which approximate the appropriate number of replicas to achieve a given level of performance. Moreover, we demonstrate the feasibility of our replication management model through simulation. The results expose the effectiveness of our models and their accuracy. 1.
Towards serializable . . .
, 2007
"... Towards serializable replication with snapshot isolation Replicated database systems necessarily deal with multiple versions of data items active concurrently across nodes in a replication group. As a consequence, there is a natural fit between replication and snapshot isolation (SI), which uses mul ..."
Abstract
- Add to MetaCart
Towards serializable replication with snapshot isolation Replicated database systems necessarily deal with multiple versions of data items active concurrently across nodes in a replication group. As a consequence, there is a natural fit between replication and snapshot isolation (SI), which uses multiple versions of data within a single site to provide nonblocking read operations. However, snapshot isolation does not guarantee serializable execution for arbitrary applications. Recent theory has established necessary and sufficient conditions on applications under which SI does guarantee serializability. This paper describes a novel replication algorithm using snapshot isolation within each node in a replication group to provide snapshot isolation to applications using the group. Update transactions can be initiated at any node in the group and a “master ” node is transparently elected to detect conflicts between updating transactions. All reads can be satisfied locally at replicas, and in particular pure read-only transactions do not require any interaction with the master node. Committed updates are propagated to replicas and can be applied without blocking due to application activity. The algorithm described here is being implemented and evaluated as an extension to the replication support built into Oracle Berkeley DB. The new algorithm overcomes a limitation that required applications using Berkeley DB’s replication to route updates to the current “master ” node and improves scalability by allowing reads that are part of an update transaction to be performed at replicated nodes. Further, this paper describes a tool that analyzes transaction logs applying recent results in serializability theory to automatically determine whether SI anomalies could occur in the executed set of transactions. Used together, the replication algorithm and the anomaly detection tool provide scalable replication with non-blocking reads that ensures serializable execution. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data
Pangea: An Eager Database Replication Middleware guaranteeing Snapshot Isolation without Modification of Database Servers
, 2009
"... Recently, several middleware-based approaches have been proposed. If we implement all functionalities of database replication only in a middleware layer, we can avoid the high cost of modifying existing database servers or scratchbuilding. However, it is a big challenge to propose middleware which c ..."
Abstract
- Add to MetaCart
Recently, several middleware-based approaches have been proposed. If we implement all functionalities of database replication only in a middleware layer, we can avoid the high cost of modifying existing database servers or scratchbuilding. However, it is a big challenge to propose middleware which can enhance performance and scalability without modification of database servers because the restriction may cause extra overhead. Unfortunately, many existing middleware-based approaches suffer from several shortcomings, i.e., some cause a hidden deadlock, some provide only table-level locking, some rely on total order communication tools, and others need to modify existing database servers. In this paper, we propose Pangea, a new eager database replication middleware guaranteeing snapshot isolation that solves the drawbacks of existing middleware by exploiting the property of the first updater wins rule. We have implemented the prototype of Pangea on top of PostgreSQL servers without modification. An advantage of Pangea is that it uses less than 2000 lines of C code. Our experimental results with the TPC-W benchmark reveal that, compared to an existing middleware guaranteeing snapshot isolation without modification of database servers, Pangea provides better performance in terms of throughput and scalability.
DataGuide-based Distribution for XML Documents
"... Distribution is a well-known solution to increase performance and provide load balancing in case you need optimal resource utilization. Together with replication it also allows improved reliability, accessibility and fault-tolerance. However since the amount of data is large there is a problem of ma ..."
Abstract
- Add to MetaCart
Distribution is a well-known solution to increase performance and provide load balancing in case you need optimal resource utilization. Together with replication it also allows improved reliability, accessibility and fault-tolerance. However since the amount of data is large there is a problem of maintaining meta-information about distribution and finding needed data fragments during execution of queries. These problems are well understood but they have not received much attention in the context of XML data management. This paper presents research-in-progress, which examines the possibility of management of meta-information about XML data distribution extending auxillary index structure called DataGuide. 1

