Results 1 - 10
of
102
Querying Semi-Structured Data
, 1997
"... The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including W ..."
Abstract
-
Cited by 467 (19 self)
- Add to MetaCart
The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specific interfaces, or data exchange formats. Some of this data is raw data, e.g. images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases...
Indexing and Querying XML Data for Regular Path Expressions
- IN VLDB
, 2001
"... With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML ..."
Abstract
-
Cited by 265 (9 self)
- Add to MetaCart
With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) ##-Join for searching paths from an element to another, (2) ##-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) ##-Join for finding Kleene-Closure on repeated paths or elements. The ##-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an or- # This work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037) and Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents of the paper. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its...
Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity
, 1998
"... Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. ..."
Abstract
-
Cited by 193 (13 self)
- Add to MetaCart
Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. However, in many cases, this assumption does not hold; determining if two name constants should be considered identical can require detailed knowledge of the world, the purpose of the user's query, or both. In this paper, we reject the assumption that global domains can be easily constructed, and assume instead that the names are given in natural language text. We then propose a logic called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. We describe an efficient implementation of WHIRL and evaluate it experimentally on data extracted from the World Wide Web. We show that WHIR...
Typechecking for XML Transformers
- IN PROCEEDINGS OF THE NINETEENTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS
, 2000
"... ..."
Optimizing Regular Path Expressions Using Graph Schemas
, 1998
"... Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of pa ..."
Abstract
-
Cited by 136 (5 self)
- Add to MetaCart
Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of path expressions is inefficient, however, because it often requires exhaustive graph search. We describe two optimization techniques for queries with regular path expressions, which we call regular queries. Both rely on graph schemas, which specify partial knowledge of a graph's structure. Query pruning restricts search to a fragment of the graph; we give an efficient algorithm for rewriting any regular query into a pruned one. Query rewriting using state extents can entirely eliminate or substantially reduce graph traversal; it is reminiscent of optimizing relational queries using indices. There may be several ways to optimize a query using state extents; we give an exponential-time algorith...
DTD Inference for Views of XML Data
, 1999
"... We study the inference of Data Type Definitions (DTDs) for views of XML data, using an abstraction that focuses on document content structure. The views are defined by a query language that produces a list of documents selected from one or more input sources. The selection conditions involve vertica ..."
Abstract
-
Cited by 106 (12 self)
- Add to MetaCart
We study the inference of Data Type Definitions (DTDs) for views of XML data, using an abstraction that focuses on document content structure. The views are defined by a query language that produces a list of documents selected from one or more input sources. The selection conditions involve vertical and horizontal navigation, thus querying explicitly the order present in input documents. We point several strong limitations in the descriptive ability of current DTDs and the need for extending them with (i) a subtyping mechanism and (ii) a more powerful specification mechanism than regular languages, such as context-free languages. With these extensions, we show that one can always infer tight DTDs, that precisely characterize a selection view on sources satisfying given DTDs. We also show important special cases where one can infer a tight DTD without requiring extension (ii). Finally we consider related problems such as verifying conformance of a view definition with a predefined DTD....
Algorithmics and Applications of Tree and Graph Searching
- In Symposium on Principles of Database Systems
, 2002
"... Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree an ..."
Abstract
-
Cited by 89 (8 self)
- Add to MetaCart
Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in next-generation database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.
Integrity Constraints for XML
, 1999
"... this paper, we extend XML DTDs with several classes of integrity constraints and investigate the complexity of reasoning about these constraints. The constraints range over keys, foreign keys, inverse constraints as well as ID constraints for capturing the semantics of object identities. They imp ..."
Abstract
-
Cited by 79 (12 self)
- Add to MetaCart
this paper, we extend XML DTDs with several classes of integrity constraints and investigate the complexity of reasoning about these constraints. The constraints range over keys, foreign keys, inverse constraints as well as ID constraints for capturing the semantics of object identities. They improve semantic specifications and provide a better reference mechanism for native XML applications. They are also useful in information exchange and data integration for preserving the semantics of data originating in relational and object-oriented databases. We establish complexity and axiomatization results for the (finite) implication problems associated with these constraints. In addition, we study implication of more general constraints, such as functional, inclusion and inverse constraints defined in terms of navigation paths

