Results 1 - 10
of
23
PNUTS: Yahoo!’s hosted data serving platform
- IN PROC. 34TH VLDB
, 2008
"... We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistenc ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.
The End of an Architectural Era (It's Time for a Complete Rewrite
- Proceedings of the 31st international
, 2005
"... In previous papers [SC05, SBC+07], some of us predicted the end of “one size fits all ” as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in ..."
Abstract
-
Cited by 55 (9 self)
- Add to MetaCart
In previous papers [SC05, SBC+07], some of us predicted the end of “one size fits all ” as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets. Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C. We conclude that the current RDBMS code lines, while attempting to be a “one size fits all ” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch ” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs. 1.
OLTP Through the Looking Glass, and What We Found There
"... Online Transaction Processing (OLTP) databases include a suite of features — disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading — that were optimized for computer technology of the late 1970’s. Advances in modern processors, memories, and networks me ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Online Transaction Processing (OLTP) databases include a suite of features — disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading — that were optimized for computer technology of the late 1970’s. Advances in modern processors, memories, and networks mean that today’s computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less. Yet database architecture has changed little. Based on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C. Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload. Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance. We also show that there is no single “high pole in the tent ” in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations. Categories and Subject Descriptors H.2.4 [Database Management]: Systems — transaction processing; concurrency.
Abbadi, “G-Store: A Scalable Data Store for Transactional Multi key
- Access in the Cloud,” in SOCC, 2010
"... Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores – such as Bigtable, PNUTS, D ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores – such as Bigtable, PNUTS, Dynamo, and their open source analogues – have been the preferred data stores for applications in the cloud. In these systems, data is represented as Key-Value pairs, and atomic access is provided only at the granularity of single keys. While these properties work well for current applications, they are insufficient for the next generation web applications – such as online gaming, social networks, collaborative editing, and many more – which emphasize collaboration. Since collaboration by definition requires consistent access to groups of keys, scalable and consistent multi key access is critical for such applications. We propose the Key Group abstraction that defines a relationship between a group of keys and is the granule for on-demand transactional access. This abstraction allows the Key Grouping protocol to collocate control for the keys in the group to allow efficient access to the group of keys. Using the Key Grouping protocol, we design and implement G-Store which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access. Our implementation using a standard key-value store and experiments using a cluster of commodity machines show that G-Store preserves the desired properties of key-value stores, while providing multi key access functionality at a very low overhead.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- CONFERENCE ON INNOVATIVE DATABASE RESEARCH (CIDR) 2011
, 2011
"... Megastore is a storage system developed to meet the requirements of today’s interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Megastore is a storage system developed to meet the requirements of today’s interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore’s semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.
ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud
"... Cloud computing has emerged as a pervasive platform for deploying scalable and highly available Internet applications. To facilitate the migration of data-driven applications to the cloud: elasticity, scalability, fault-tolerance, and self-manageability (henceforth referred to as cloud features) are ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
Cloud computing has emerged as a pervasive platform for deploying scalable and highly available Internet applications. To facilitate the migration of data-driven applications to the cloud: elasticity, scalability, fault-tolerance, and self-manageability (henceforth referred to as cloud features) are fundamental requirements for database management systems (DBMS) driving such applications. Even though extremely successful in the traditional enterprise setting – the high cost of commercial relational database software, and the lack of the desired cloud features in the open source counterparts – relational databases (RDBMS) are not a competitive choice for cloud-bound applications. As a result, Key-Value stores have emerged as a preferred choice for scalable and faulttolerant data management, but lack the rich functionality, and transactional guarantees of RDBMS. We present ElasTraS, an Elastic TranSactional relational database, designed to scale out using a cluster of commodity machines while being fault-tolerant and self managing. ElasTraS is designed to support both classes of database needs for the cloud: (i) large databases partitioned across a set of nodes, and (ii) a large number of small and independent databases common in multi-tenant databases. ElasTraS borrows from the design philosophy of scalable Key-Value stores to minimize distributed synchronization and remove scalability bottlenecks, while leveraging decades of research on transaction processing, concurrency control, and recovery to support rich functionality and transactional guarantees. We present the design of ElasTraS, implementation details of our initial prototype system, and experimental results executing the TPC-C benchmark.
Rethinking Cost and Performance of Database Systems
- SIGMOD Rec
"... Traditionally, database systems were optimized in the following way: "Given a set of machines, try to minimize the response time of each request. " This paper argues that today, users would like a database system to optimize the opposite question: "Given a response time goal for each request, try to ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Traditionally, database systems were optimized in the following way: "Given a set of machines, try to minimize the response time of each request. " This paper argues that today, users would like a database system to optimize the opposite question: "Given a response time goal for each request, try to minimize the number of machines (i.e., cost in $). " Furthermore, this paper gives an example that demonstrates that the new optimization problem may result in a totally different system architecture. 1.
Implementing Business Conversations with Consistency Guarantees using Message-oriented Middleware
"... The paper considers distributed applications where interactions between constituent services take place via messages in an asynchronous environment with unpredictable communication and processing delays; further, interacting parties are not required to be online at the same time. Message-oriented mi ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The paper considers distributed applications where interactions between constituent services take place via messages in an asynchronous environment with unpredictable communication and processing delays; further, interacting parties are not required to be online at the same time. Message-oriented middleware (MoM) is commonly used for connecting such loosely coupled distributed applications. Despite loose coupling, many service interactions have temporal and message validation constraints. A failure to deliver a valid message within its time constraint could cause mutually conflicting views of an interaction (one party regarding it as timely whilst the other party regarding it as untimely) leading to application level inconsistencies. In a loosely coupled system, such inconsistencies could remain undetected for a long time, requiring costly application level recovery procedures. This paper describes how synchronisation support providing multilateral consistency guarantees can be provided using the underlying MoM to prevent inconsistencies from reaching application level. 1.
Shore-MT: A Scalable Storage Manager for the Multicore Era
- EXTENDING DATABASE TECHNOLOGY (EDBT)
, 2009
"... Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed stora ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed storage managers to use non-scalable approaches without any penalty. With the arrival of multicore chips, however, this situation is rapidly changing. More and more threads can run in parallel, stressing the internal scalability of the storage manager. Systems optimized for high performance at a limited number of cores are not assured similarly high performance at a higher core count, because unanticipated scalability obstacles arise. We benchmark four popular open-source storage managers (Shore, BerkeleyDB, MySQL, and PostgreSQL) on a modern multicore machine, and find that they all suffer in terms of scalability. We briefly examine the bottlenecks in the various storage engines. We then present Shore-MT, a multithreaded and highly scalable version of Shore which we developed by identifying and successively removing internal bottlenecks. When compared to other DBMS, Shore-MT exhibits superior scalability and 2-4 times higher absolute throughput than its peers. We also show that designers should favor scalability to single-thread performance, and highlight important principles for writing scalable storage engines, illustrated with real examples from the development of Shore-MT.
Towards Resource-Oriented BPEL
"... Abstract. Service orientation is the de-facto architectural style, today. But, what actually is a service and how should service boundaries be chosen? Resource orientation, once seen as a ”light-weight ” approach to Web services, is reshaping itself as a modeling strategy to service orientation. Alo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Service orientation is the de-facto architectural style, today. But, what actually is a service and how should service boundaries be chosen? Resource orientation, once seen as a ”light-weight ” approach to Web services, is reshaping itself as a modeling strategy to service orientation. Along comes the realization that resources are in-fact complex state machines. Currently, there is no accepted standard for modeling the internal state of resources. In this paper, BPEL is proposed as a modeling language for resources and necessary extensions to BPEL are outlined. 1

