A Framework for Reliable and Efficient Data Placement in Distributed Computing Systems
 Journal of Parallel and Distributed Computing
, 2005
"... Data placement is an essential part of today’s distributed applications since moving the data close to the application has many benefits. The increasing data requirements of both scientific and commercial applications, and collaborative access to these data make it even more important. In the curren ..."
Abstract

Data placement is an essential part of today’s distributed applications since moving the data close to the application has many benefits. The increasing data requirements of both scientific and commercial applications, and collaborative access to these data make it even more important. In the current approach, data placement is regarded as a side affect of computation. Our goal is to make data placement a first class citizen in distributed computing systems just like the computational jobs. They will be queued, scheduled, monitored, managed, and even checkpointed. Since data placement jobs have different characteristics than computational jobs, they cannot be treated in the exact same way as computational jobs. For this purpose, we are proposing a framework which can be considered as a “data placement subsystem ” for distributed computing systems, similar to the I/O subsystem in operating systems. This framework includes a specialized scheduler for data placement, a high level planner aware of data placement jobs, a resource broker/policy enforcer and some optimization tools. Our system can perform reliable and efficient data placement, it can recover from all kinds of failures without any human intervention, and it can dynamically adapt to the environment at the execution time. Key words. Distributed computing, reliable and efficient data placement, scheduling, runtime adaptation, protocol autotuning, data intensive applications, I/O subsystem. 1.
An Overview of Data Replication on the Internet
 In Proc. of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN
, 2002
"... The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by endusers has become the primary performance objecti ..."
Abstract

The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by endusers has become the primary performance objective compared to more traditional issues, such as server utilization. The two promising techniques to improve the Internet responsiveness are caching and replication. In this paper we present an overview of recent research in replication. We begin by arguing on the important role of replication in decreasing client perceived response time and proceed by illustrating the main topics that affect its successful deployment on the Internet. We analyze and characterize existing research, providing taxonomies and classifications whenever possible. Our discussion reveals several open problems and research directions. 1
Data Placement in Widely Distributed Systems
, 2005
"... The unbounded increase in the computation and data requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Ef ..."
Abstract

The unbounded increase in the computation and data requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in a widely distributed environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such “data placements ” also require careful management of storage resources and data movement, i.e. allocating storage space, stagingin of input data, stagingout of generated data, and deallocation of local storage after the data is safely stored at the destination. Existing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. In this dissertation, we propose a framework that decouples computation and data placement, allows asynchronous execution of each, and treats data
Geometric engineering of (framed) BPS states
"... Abstract. BPS quivers for N = 2 SU(N) gauge theories are derived via geometric engineering from derived categories of toric CalabiYau threefolds. While the outcome is in agreement of previous low energy constructions, the geometric approach leads to several new results. An absence of walls conjectu ..."
Abstract

Abstract. BPS quivers for N = 2 SU(N) gauge theories are derived via geometric engineering from derived categories of toric CalabiYau threefolds. While the outcome is in agreement of previous low energy constructions, the geometric approach leads to several new results. An absence of walls conjecture is formulated for all values of N, relating the field theory BPS spectrum to large radius Dbrane bound states. Supporting evidence is presented as explicit computations of BPS degeneracies in some examples. These computations also prove the existence of BPS states of arbitrarily high spin and infinitely many marginal stability walls at weak coupling. Moreover, framed quiver models for framed BPS states are naturally derived from this formalism, as well as a mathematical formulation of framed and unframed BPS degeneracies in terms of motivic and cohomological DonaldsonThomas invariants. We verify the conjectured absence of BPS states with “exotic ” SU(2)R quantum numbers using motivic DT invariants. This application is based in particular on a complete recursive algorithm which determines the unframed BPS spectrum at any point on the Coulomb branch in terms of noncommutative DonaldsonThomas invariants for framed quiver representations.
Notes on a new construction of hyperkahler metrics
"... In joint work with Davide Gaiotto and Greg Moore [1] we recently proposed a new connection between hyperkähler geometry and the counting of BPS states in supersymmetric field theory. While the story is motivated by physics, it leads to a concrete new recipe for constructing complete hyperkähler me ..."
Abstract

In joint work with Davide Gaiotto and Greg Moore [1] we recently proposed a new connection between hyperkähler geometry and the counting of BPS states in supersymmetric field theory. While the story is motivated by physics, it leads to a concrete new recipe for constructing complete hyperkähler metrics on the total spaces of certain complex integrable systems. The aim of this note is briefly to describe what this recipe is, and to comment on some of the issues involved in converting it into an actual theorem. Let us briefly describe some of the highlights. • We begin with a collection of “integrable system data ” described in Section 2.1 below. These data include a complex manifold B containing a divisor D. For example, B could be the complex plane, and D some collection of points. The data also include a local system of lattices Γ over B ′ = B \D, from which we build a 2rtorus bundle M′ over B′, with nontrivial monodromy around D. Finally, we have a “central charge” homomorphism Z: Γ → C, varying holomorphically over B′. From these data we build a simple explicit hyperkähler metric gsf onM′. However, the metric gsf is incomplete, and our main interest is in complete metrics. • Naively we might hope to complete gsf by adding some degenerate torus fibers over D, thus extendingM ′ toM⊃M′, in such a way that gsf will extend toM. However, it seems that this is impossible: roughly speaking, gsf is too homogeneous to have such an extension. Instead, we construct a new metric g on M′, which differs from gsf by certain “quantum corrections.” • The quantum corrections are obtained by solving a certain explicit integral equation, (4.8) below. The main new ingredient in this equation is a set of integer “invariants” Ω(γ), which should be examples of generalized DonaldsonThomas invariants in the sense of [2, 3]. In particular, the KontsevichSoibelman wallcrossing formula for generalized DonaldsonThomas invariants, as written in [2], plays an important role in the construction. Indeed the original motivation for this construction was an attempt to understand the physical meaning of the formula of [2]. 1 ar
Spectral networks and snakes
"... Abstract: We apply and illustrate the techniques of spectral networks in a large collection of AK−1 theories of class S, which we call “lifted A1 theories. ” Our construction makes contact with Fock and Goncharov’s work on higher Teichmüller theory. In particular we show that the Darboux coordinate ..."
Abstract

Abstract: We apply and illustrate the techniques of spectral networks in a large collection of AK−1 theories of class S, which we call “lifted A1 theories. ” Our construction makes contact with Fock and Goncharov’s work on higher Teichmüller theory. In particular we show that the Darboux coordinates on moduli spaces of flat connections which come from certain special spectral networks coincide with the FockGoncharov coordinates. We show, moreover, how these techniques can be used to study the BPS spectra of lifted A1 theories. In particular, we determine the spectrum generators for all the lifts of a simple superconformal field theory. ar X iv
Runtime Adaptation of Grid Data Placement Jobs
 In Proceedings of Int. Workshop on Adaptive Grid Middleware
, 2003
"... Grid presents a continuously changing environment. It also introduces a new set of failures. The data grid initiative has made it possible to run dataintensive applications on the grid. Dataintensive grid applications consist of two parts: a data placement part and a computation part. The data pla ..."
Abstract
Grid presents a continuously changing environment. It also introduces a new set of failures. The data grid initiative has made it possible to run dataintensive applications on the grid. Dataintensive grid applications consist of two parts: a data placement part and a computation part. The data placement part is responsible for transferring the input data to the compute node and the result of the computation to the appropriate storage system. While work has been done on making computation adapt to changing conditions, little work has been done on making the data placement adapt to changing conditions.