Results 11 -
17 of
17
NoSQL: The Death of the Star
"... Abstract. In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g. BigTable and C-Store) are blooming. On the other hand, cloud computing is already a reality that helps to sav ..."
Abstract
- Add to MetaCart
Abstract. In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g. BigTable and C-Store) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Thus, specific software tools to exploit the cloud have also appeared. The trend in this case is to use implementations based on the MapReduce paradigm developed by Google. The basic goal of this talk will be the introduction and the discussion of these ideas from the point of view of Data Warehousing and OLAP. We will see advantages, disadvantages and some possibilities it offers. 1
Sigiri: Uniform Abstraction for Large-Scale Compute Resource Interactions
"... Abstract—Scientists who conduct mid-range computationally heavy modeling and analysis often scramble to find sufficient computational resources to test and run their codes. The science they carry out is not petascale or even terascale science but the computational needs often go beyond what can be s ..."
Abstract
- Add to MetaCart
Abstract—Scientists who conduct mid-range computationally heavy modeling and analysis often scramble to find sufficient computational resources to test and run their codes. The science they carry out is not petascale or even terascale science but the computational needs often go beyond what can be satisfied by their university. With the maturation of Grid computing facilities and recent explosion of cloud computing data centers, mid-scale computational science has more options to satisfy computational needs. This paper focuses on a simple abstraction for interaction with heterogeneous resource managers spanning grid and cloud computing, and on features that make the tool useful for the midscale physical or natural scientist. A key aspect of the service is its support for multiple standard job specification languages and the ability for the user to directly interact with the service, removing the delay that can come through layers of services. I.
On the Elasticity of NoSQL Databases over Cloud Management Platforms
"... NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance and directly applies to the philosophy of a cloudbase ..."
Abstract
- Add to MetaCart
NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance and directly applies to the philosophy of a cloudbased platform. Yet, the process of adaptive expansion and contraction of resources usually involves a lot of manual effort during cluster configuration. To date, there exists no comparative study to quantify this cost and measure the efficacy of NoSQL engines that offer this feature over a cloud provider. In this work, we present a cloud-enabled framework for adaptive monitoring of NoSQL systems. We perform a study of the elasticity feature on some of the most popular NoSQL databases over an open-source cloud platform. Based on these measurements, we finally present a prototype implementation of a decision making system that enables automatic elastic operations of any NoSQL engine based on administrator or application-specified constraints.
Usage Patterns to Provision for Scientific Experimentation in Clouds
"... Abstract—Driven by the need to provision resources on demand, scientists are turning to commercial and research test-bed Cloud computing resources to run their scientific experiments. Job scheduling on cloud computing resources, unlike earlier platforms, is a balance between throughput and cost of e ..."
Abstract
- Add to MetaCart
Abstract—Driven by the need to provision resources on demand, scientists are turning to commercial and research test-bed Cloud computing resources to run their scientific experiments. Job scheduling on cloud computing resources, unlike earlier platforms, is a balance between throughput and cost of executions. Within this context, we posit that usage patterns can improve the job execution, because these patterns allow a system to plan, stage and optimize scheduling decisions. This paper introduces a novel approach to utilization of user patterns drawn from knowledgebased techniques, to improve execution across a series of active workflows and jobs in cloud computing environments. Using empirical analysis we establish the accuracy of our prediction approach for two different workloads and demonstrate how this knowledge can be used to improve job executions.
Principles of Distributed Data Management in 2020? 1
"... Abstract. With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distrib ..."
Abstract
- Add to MetaCart
Abstract. With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be still presented based on the three dimensions of earlier editions: distribution, heterogeneity and autonomy of the data sources. In retrospect, the focus on fundamental principles and generic techniques has been useful not only to understand and teach the material, but also to enable an infinite number of variations. The primary application of these generic techniques has been obviously for distributed and parallel DBMS versions. Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems (e.g. MapReduce, Hadoop, SciDB, Peanut, Pig latin, etc.) are emerging and receiving much attention from the research community. Although they do well in terms of consistency/flexibility/performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability. The key questions I discuss are: What are the fundamental principles behind the emerging solutions? Is there any generic architectural model, to explain those principles? Do we need new foundations to look at data distribution? 1
Policy Driven Data Management in PL-Grid Virtual Organizations
"... Abstract In this chapter, we intend to introduce a novel approach to data management within the Grid environment based on user-defined storage policies. This approach aims at enabling Grid application developers to specify requirements concerning storage elements which will be exploited by the appli ..."
Abstract
- Add to MetaCart
Abstract In this chapter, we intend to introduce a novel approach to data management within the Grid environment based on user-defined storage policies. This approach aims at enabling Grid application developers to specify requirements concerning storage elements which will be exploited by the application during runtime. Most of the existing Grid middleware focus on unifying access to available storage elements, e.g. by applying various virtualization techniques. While this is suitable for many Grid applications, there is a category of applications, namely the data-intensive one which often has more specific needs. The chapter outlines research and development work carried out in the PL-GRID and OntoStor projects to solve this issue within the PL-GRID infrastructure. 1

