Mapping-Driven Data Access
BibTeX
@MISC{Melnik_mapping-drivendata,
author = {Sergey Melnik},
title = {Mapping-Driven Data Access},
year = {}
}
OpenURL
Abstract
Virtually all modern enterprise applications manipulate data represented in multiple formats, such as objects in a programming language, rows in relational databases, or XML structures, which are exposed through distinct programming models. As a result, application developers face a constant challenge of translating data and data access operations across different data representations. Even in simple object-to-relational mapping scenarios where a set of objects is partitioned across several relational tables we can end up with transformations that require outer joins, nested queries, and case statements in order to reassemble objects from tables (we will see an example of that shortly). No wonder, implementing such transformations in real applications is difficult, especially if data needs to be updatable. Supporting updates is a common requirement, since many enterprise applications manage operational data. A common way of shielding the business logic from the intricacies of data manipulation is by using a data access layer that encapsulates the necessary transformations. Typically, the job of a data access layer is to provide an updatable client-side view that exposes persistent data as business entities. The backbone of a general-purpose data access layer is a mapping that establishes a relationship between data represented according to different schemas and data models. In contrast to hardwiring the data access code for a specific application, a mapping is used to parameterize the transformation logic or generate it automatically. Today, mappings between different formats are deployed as first-class programming artifacts in enterprise systems. Therefore, mapping-driven data access, i.e., the problem of translating data and data access operations over mappings, is of critical importance to the developers of enterprise applications. Solutions to this problem have proven to be elusive, measured both in terms of commercial successes and elapsed research effort. The industry landscape is covered with carcasses of failed data access products and in-house projects. Products that made their way to the market offer widely varying degrees of capability, robustness, and total cost of ownership. Industry experts give mixed advice about purchasing off-the-shelf solutions vs. developing homegrown ones. What is so hard about mapping-driven data access? Consider object-to-relational mappers (O/R-Ms), a widely used data access technology. It is relatively straightforward to build an O/R-M that uses a one-to-one mapping to expose each row in a relational table as an object, especially if no declarative data manipulation is required. However, as more complex mappings, set-based operations, performance, multi-DBMS-vendor support, and other requirements weigh in, ad hoc solutions quickly grow out of hand. This is not surprising since mapping-driven data access lies at the intersection of two longstanding data management problems: • Impedance mismatch, the problem of accessing persistent storage through programming language abstractions. Its focus is on bridging the gap between two distinct data models (e.g., relational and object-oriented), usually with minimal data reshaping. • Data integration, the problem of providing unified access to heterogeneous data. Here, the focus is on data reshaping, usually within a common data model, assuming that the impedance mismatch has been resolved (e.g., using wrappers). The reality of building enterprise applications is that impedance mismatch and data integration problems go hand in hand. Solutions that address language binding but ignore data reshaping fall short when in comes to accessing legacy databases, or when applications and data need to evolve independently. Solutions that target data reshaping and leave language binding out of scope push a lot of plumbing work back to the developers.







