Results 1 -
5 of
5
Adding structure to unstructured data
- In 6th Int. Conf. on Database Theory (ICDT ’97),LNCS 1186, 336–350
, 1997
"... We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that ..."
Abstract
-
Cited by 195 (22 self)
- Add to MetaCart
We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and e ciently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization. 1
HTML Document Analysis for Information Extraction
- In Proceedings of 8th EEICT conference. Brno, CZ, FIT VUT
, 2002
"... this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents
PIA -- A Generic Model and System for Interactive Product and Service Catalogs
, 1999
"... This text motivates and defines a generic model for interactive (online or offline) product catalogs. Based on a detailed requirements analysis, the data model is defined using an object-oriented design notation and the querylanguage for expressing customer interests on the catalog is defined using ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This text motivates and defines a generic model for interactive (online or offline) product catalogs. Based on a detailed requirements analysis, the data model is defined using an object-oriented design notation and the querylanguage for expressing customer interests on the catalog is defined using techniques from fuzzy set theory. The model provides the basis for the implementation of a generic, highly-interactive catalog management system which is designed to be interfaced with relational databases, information-retrieval engines and special-purpose index structures.
AMetaCar -- a mediated eCommerce-Solution for the Used-Car-Market
- University of Berlin
, 1999
"... Providing integrated access to multiple information offerings on the Web poses some fundamentally new challenges for practical federated information systems. Other than conventional databases with an explicit and rather stable schema and expressive query APIs, Web-sources deliver semi-structured dat ..."
Abstract
- Add to MetaCart
Providing integrated access to multiple information offerings on the Web poses some fundamentally new challenges for practical federated information systems. Other than conventional databases with an explicit and rather stable schema and expressive query APIs, Web-sources deliver semi-structured data without explicit logical structure and semantics, and have rather limited query capabilities. In this paper we present AMetaCar, a federated Web-information system, that provides integrated access to a number of existing Web-catalogues for used cars. At the wrapper level, the sources are continually monitored, and their implicit structure is explicated by means of XML. At the mediator level a hybrid approach is used, which combines a relational DBMS for the regular and common attributes of used-car offers with a persistent XML-DOM (Document Object Model) implementation for the irregular and heterogeneous attributes. XSL is used for presenting XML results. This overall approach combines the...
Using WG-Log to interact with heterogeneous data sources: the example of OEM
"... In this paper we discuss the possibility to represent synthetically semistructured information via a loose notion of schema: we say that data are semistructured when, although some structure is present, it is not as strict, regular, or complete as the one required by the traditional database manage ..."
Abstract
- Add to MetaCart
In this paper we discuss the possibility to represent synthetically semistructured information via a loose notion of schema: we say that data are semistructured when, although some structure is present, it is not as strict, regular, or complete as the one required by the traditional database management systems. Our proposal is based on WG-Log, a graph based language for the representation of WWW site information. We show how information encoded in a typical semistructured information model, as OEM, can be represented and queried by means of the WG-Log language, and how the TSIMMIS and WG-Log Web Query System can be integrated to allow site content exploration and exploitation by means of WG-Log. 1 Introduction We say that data are semistructured when, although some structure is present, it is not as strict, regular, or complete as the one required by the traditional database management systems (see [Abi97] for a survey on semistructured data). Information is semistructured also when...

