Results 1 -
3 of
3
Storing semistructured data with STORED
"... Systems for managing and querying semistructured-data sources often store data in proprietary object repositories or in a tagged-text format. We describe a technique that can use relational database management systems to store and manage semistructured data. Our technique relies on a mapping between ..."
Abstract
-
Cited by 214 (8 self)
- Add to MetaCart
Systems for managing and querying semistructured-data sources often store data in proprietary object repositories or in a tagged-text format. We describe a technique that can use relational database management systems to store and manage semistructured data. Our technique relies on a mapping between the semistructured data model and the relational data model, expressed in a query language called STORED. When a semistrcutured data instance is given, a STORED mapping can be generated automatically using data-mining techniques. We are interested in applying STORED to XML data, which is an instance of semistructured data. We show how a document-type-descriptor (DTD), when present, can be exploited to further improve performance.
Semistructured Data and XML
, 1998
"... This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for th ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
This paper argues that the research on semistructured data is receiving a new set of challenges with the advent of XML (Extensible Mark-up Language [Bos97, Con98]). This is a new standard approved by the World Wide Web Consortium that many believe will become the de facto data exchange format for the Web. XML supports the electronic exchange of machine-readable data (while HTML is designed primarily for human-readable documents). XML data shares many features of semistructured data: its structure can be irregular, is not always known ahead of time, and may change frequently and without notice. On the other hand it is easy to convert data from any source into XML which will make it attractive for organizations to "publish" their information sources in XML, and thus make them available to other XML applications on the Web. For XML applications to reach their full potential however, we need to build the right tools to process data in this new format. Existing Web tools (browsers, search engines) are oriented toward document operations . For XML we need database operations , like data extraction, data integration, data translation, data storage. The research done so far on semistructured data may offer some solutions to the database problems posed by XML. For example the recently proposed query language for XML, called XML-QL [DFF
Storing Semistructured Data in Relations
- In Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats
"... this paper we argue that one can store semistructured data in relational format, by exploiting the regularities inherent in existing semistructured data instances. "Most" of the data will be stored in relational format: the outliers, and possible future insertions, will be still stored in a self-des ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper we argue that one can store semistructured data in relational format, by exploiting the regularities inherent in existing semistructured data instances. "Most" of the data will be stored in relational format: the outliers, and possible future insertions, will be still stored in a self-describing way. We propose to use data mining techniques to extract a "good" relational schema for a given semistructured data instance. Our algorithm accepts a variety of input parameters, such as maximum number of relations allowed, maximum number of attributes per relation, and, optionally, a collection of queries on the semistructured data for which the relational storage has to be optimized. Experimental results on the DBLP data show that around 90% of the data can be stored in relational format. The techniques described here are presented in more details in [DFS98].

