Results 1 -
3 of
3
Advance Resource Provisioning in Bulk Data Scheduling ∗
"... Today’s scientific and business applications generate massive data sets that need to be transferred to remote sites for sharing, processing, and long term storage. Because of increasing data volumes and enhancement in current network technology that provide ondemand high-speed data access between co ..."
Abstract
- Add to MetaCart
(Show Context)
Today’s scientific and business applications generate massive data sets that need to be transferred to remote sites for sharing, processing, and long term storage. Because of increasing data volumes and enhancement in current network technology that provide ondemand high-speed data access between collaborating institutions, data handling and scheduling problems have reached a new scale. In this paper, we present a new data scheduling model with advance resource provisioning, in which data movement operations are defined with earliest start and latest completion times. We analyze timedependent resource assignment problem, and propose a new methodology to improve the current systems by allowing researchers and higher-level meta-schedulers to use data-placement as-a-service, so they can plan ahead and submit transfer requests in advance. In general, scheduling with time and resource conflicts is NP-hard. We introduce an efficient algorithm to organize multiple requests on the fly, while satisfying users ’ time and resource constraints. We successfully tested our algorithm in a simple benchmark simulator that we have developed, and demonstrated its performance with initial test results.
1Application-Level Optimization of Big Data Transfers Through Pipelining, Parallelism and Concurrency
"... Abstract—In end-to-end data transfers, there are several factors affecting the data transfer throughput, such as the network characteristics (e.g. network bandwidth, round-trip-time, background traffic); end-system characteristics (e.g. NIC capacity, number of CPU cores and their clock rate, number ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In end-to-end data transfers, there are several factors affecting the data transfer throughput, such as the network characteristics (e.g. network bandwidth, round-trip-time, background traffic); end-system characteristics (e.g. NIC capacity, number of CPU cores and their clock rate, number of disk drives and their I/O rate); and the dataset characteristics (e.g. average file size, dataset size, file size distribution). Optimization of big data transfers over inter-cloud and intra-cloud networks is a challenging task that requires joint-consideration of all of these parameters. This optimization task becomes even more challenging when transferring datasets comprised of heterogeneous file sizes (i.e. large files and small files mixed). Previous work in this area only focuses on the end-system and network characteristics however does not provide models regarding the dataset characteristics. In this study, we analyze the effects of the three most important transfer parameters that are used to enhance data transfer throughput: pipelining, parallelism and concurrency. We provide models and guidelines to set the best values for these parameters and present two different transfer optimization algorithms that use the models developed.The tests conducted over high-speed networking and cloud testbeds show that our algorithms outperform the most popular data transfer tools like Globus Online and UDT in majority of the cases.
doi:10.1098/rsta.2011.0147 PREFACE Selected papers from the 2010 e-Science
"... The annual e-Science All Hands Meeting (AHM) is the premier e-Science conference held regularly in the United Kingdom, and provides a forum for the e-Science community to present and demonstrate their research, exchange ideas and socialize. This Theme Issue, entitled ‘e-Science: novel research, new ..."
Abstract
- Add to MetaCart
The annual e-Science All Hands Meeting (AHM) is the premier e-Science conference held regularly in the United Kingdom, and provides a forum for the e-Science community to present and demonstrate their research, exchange ideas and socialize. This Theme Issue, entitled ‘e-Science: novel research, new science, and enduring impact’, features selected papers from AHM 2010 with the aim of highlighting some of the most innovative and interesting research of the e-Science community. The AHM typically attracts approximately a few hundred participants from academia, industry and commerce, with a technical programme of keynote presentations, regular sessions, workshops, poster sessions and birds-of-a-feather meetings. As the AHMs have developed, they have adopted a broad interpretation of what is meant by e-Science, and that is reflected in the papers selected for this issue. 2. Contents The ninth e-Science All Hands Meeting (AHM 2010) was held on 13–15 September