Results 1 - 10
of
40
A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database
, 1999
"... XML is emerging as one of the dominant data formats for data processing on the Internet. To query XML data, query languages like XQL, Lorel, XML-QL, or XML-GL have been proposed. In this paper, we study how XML data can be stored and queried using a standard relational database system. For this pur ..."
Abstract
-
Cited by 117 (1 self)
- Add to MetaCart
XML is emerging as one of the dominant data formats for data processing on the Internet. To query XML data, query languages like XQL, Lorel, XML-QL, or XML-GL have been proposed. In this paper, we study how XML data can be stored and queried using a standard relational database system. For this purpose, we present alternative mapping schemes to store XML data in a relational database and discuss how XML-QL queries can be translated into SQL queries for every mapping scheme. We present the results of comprehensive performance experiments that analyze the tradeos of the alternative mapping schemes in terms of database size, query performance and update performance. While our discussion is focussed on XML and XML-QL, the results of this paper are relevant for most semi-structured data models and most query languages for semi-structured data.
Database architecture optimized for the new bottleneck: Memory access
- In Proceedings of VLDB Conference
, 1999
"... In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impac ..."
Abstract
-
Cited by 109 (9 self)
- Add to MetaCart
In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture; in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events like TLB misses, L1 and L2 cache misses, by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms make them perform well, which is confirmed by experimental results. ∗*This work was carried out when the author was at the
MIL Primitives For Querying A Fragmented World
, 1999
"... In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful inter ..."
Abstract
-
Cited by 57 (16 self)
- Add to MetaCart
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet Interpreter Language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation of MIL.
A Case for Fractured Mirrors
- International Conference on Very Large Databases (Hong Kong
, 2002
"... The Decomposition Storage Model (DSM) vertically partitions all attributes of a given relation. DSM has excellent I/O behavior when the number of attributes touched in the query is small. It also has a better cache footprint than the N-ary storage model (NSM) that is used by most database systems. H ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
The Decomposition Storage Model (DSM) vertically partitions all attributes of a given relation. DSM has excellent I/O behavior when the number of attributes touched in the query is small. It also has a better cache footprint than the N-ary storage model (NSM) that is used by most database systems. However, DSM incurs a high cost in reconstructing the original tuple from the partitions. We first revisit some of the performance problems associated with DSM. We suggest a simple indexing strategy and compare different reconstruction algorithms. The paper then proposes a new mirroring scheme, termed fractured mirrors, using both NSM and DSM models. This scheme combines the best aspects of both models, along with the added benefit of mirroring to better serve an ad-hoc query workload. A prototype system has been built using the Shore storage manager and performance is evaluated using queries from the TPCH workload.
Extending RDBMSs to support sparse datasets using an interpreted attribute storage format
- In ICDE
, 2006
"... “Sparse ” data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal “horizontal ” schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy v ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
“Sparse ” data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal “horizontal ” schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a “vertical ” schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets. 1
On the integration of IR and databases
- In Database issues in multimedia; short paper proceedings, international conference on database semantics (DS-8
, 1999
"... Abstract: Integration of information retrieval (IR) in database management systems (DBMSs) has proven difficult. Previous attempts to integration suffered from inherent performance problems, or lacked desirable separation between logical and physical data models. To overcome these problems, we discu ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Abstract: Integration of information retrieval (IR) in database management systems (DBMSs) has proven difficult. Previous attempts to integration suffered from inherent performance problems, or lacked desirable separation between logical and physical data models. To overcome these problems, we discuss a database approach based on structural object-orientation. We implement IR techniques using extensions in an object algebra called MOA. MOA has been implemented on top of the database backend Monet, a state-of-the-art highperformance database kernel with a binary relational interface. Our prototype implementation of the inference network retrieval model using MOA and Monet demonstrates the feasibility of this approach. We conclude with a discussion of the advantages of our database design.
Content-Based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events
- In proceedings of IEEE Intl. Workshop on Detection and Recognition of Events in Video
, 2001
"... As amounts of publicly available video data grow, the need to query this data efficiently becomes significant. Consequently, content-based retrieval of video data turns out to be a challenging and important problem. In this paper, we address the specific aspect of inferring semantics automatically f ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
As amounts of publicly available video data grow, the need to query this data efficiently becomes significant. Consequently, content-based retrieval of video data turns out to be a challenging and important problem. In this paper, we address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well as advantages of their integrated use.
The Mirror MMDBMS architecture
, 1999
"... Introduction Handling large collections of digitized multimedia data, usually referred to as multimedia digital libraries, is a major challenge for information technology. The Mirror DBMS is a research database system that is developed to better understand the kind of data management that is requir ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Introduction Handling large collections of digitized multimedia data, usually referred to as multimedia digital libraries, is a major challenge for information technology. The Mirror DBMS is a research database system that is developed to better understand the kind of data management that is required in the context of multimedia digital libraries (see also URL http://www.cs.utwente.nl/arjen/mmdb.html). Its main features are an integrated approach to both content management and (traditional) structured data management, and the implementation of an extensible objectoriented logical data model on a binary relational physical data model. The focus of this work is aimed at design for scalability. 2 Query processing The query facilities of the Mirror DBMS rely on the Moa Object Algebra [BWK98]. Moa constitutes an object data model and query algebra, designed to be used at the logical level of a DBMS. It
A Framework for Video Modelling
- In the Proc. of International Conference on Applied Informatics
, 2000
"... In recent years, research in video databases has increased greatly, but relatively little work has been done in the area of semantic content-based retrieval. In this paper, we present a framework for video modelling with emphasis on semantic content of video data. The video data model presented dist ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In recent years, research in video databases has increased greatly, but relatively little work has been done in the area of semantic content-based retrieval. In this paper, we present a framework for video modelling with emphasis on semantic content of video data. The video data model presented distinguishes four layers: the raw data layer, the feature layer, the object layer and the event layer. It supports automatic definition of high-level concepts, such as video objects and events, based on extracted features. We focus our attention on event descriptions in this paper and give two modelling examples in the medical and soccer domain to show that the proposed event grammar can be efficiently used in different domains. Key Words: multimedia, video modelling, content-based retrieval
Modelling and Querying Semistructured Data with MOA
- In proceedings of Workshop on Query Processing for Semistructured Data and Non-standard Data Formats
, 1999
"... This paper is organized as follows: Section 2 of this paper describes our view on the modelling of semistructured data. In Section 3 the Monet database and the MOA object algebra and the ANY structure extension is discussed. Section 4 gives an overview of the work that needs to be done and section 5 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper is organized as follows: Section 2 of this paper describes our view on the modelling of semistructured data. In Section 3 the Monet database and the MOA object algebra and the ANY structure extension is discussed. Section 4 gives an overview of the work that needs to be done and section 5 discusses the conclusions.

