Results 1  10
of
19
Curated databases
 PODS'08
, 2008
"... Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databa ..."
Abstract

Cited by 63 (10 self)
 Add to MetaCart
Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily crossreferenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.
SUCCINCTNESS OF THE COMPLEMENT AND INTERSECTION OF REGULAR EXPRESSIONS
, 2008
"... We study the succinctness of the complement and intersection of regular expressions. In particular, we show that when constructing a regular expression defining the complement of a given regular expression, a double exponential size increase cannot be avoided. Similarly, when constructing a regular ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
We study the succinctness of the complement and intersection of regular expressions. In particular, we show that when constructing a regular expression defining the complement of a given regular expression, a double exponential size increase cannot be avoided. Similarly, when constructing a regular expression defining the intersection of a fixed and an arbitrary number of regular expressions, an exponential and double exponential size increase, respectively, can in worstcase not be avoided. All mentioned lower bounds improve the existing ones by one exponential and are tight in the sense that the target expression can be constructed in the corresponding time class, i.e., exponential or double exponential time. As a byproduct, we generalize a theorem by Ehrenfeucht and Zeiger stating that there is a class of DFAs which are exponentially more succinct than regular expressions, to a fixed fourletter alphabet. When the given regular expressions are oneunambiguous, as for instance required by the XML Schema specification, the complement can be computed in polynomial time whereas the bounds concerning intersection continue to hold. For the subclass of singleoccurrence regular expressions, we prove a tight exponential lower bound for intersection.
Efficient inclusion for a class of XML types with interleaving and counting
 Proceedings of the 11th International Symposium on Database Programming Languages, DBPL 2007
"... Inclusion between XML types is important but expensive, and is much more expensive when unordered types are considered. We prove here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in t ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Inclusion between XML types is important but expensive, and is much more expensive when unordered types are considered. We prove here that inclusion for XML types with interleaving and counting can be decided in polynomial time in presence of two important restrictions: no element appears twice in the same content model, and Kleene star is only applied to disjunctions of single elements. Our approach is based on the transformation of each such content model into a set of constraints that completely characterizes the generated language. We then reduce inclusion checking to constraint implication. We exhibit a quadratic algorithm to perform inclusion checking on a RAM machine. Key words: PACS:
Simple off the shelf abstractions for XML Schema
"... Although the advent of XML Schema [25] has rendered DTDs obsolete, research on practical XML optimization is mostly biased towards DTDs and tends to largely ignore XSDs (some notable exceptions nonwithstanding). One ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Although the advent of XML Schema [25] has rendered DTDs obsolete, research on practical XML optimization is mostly biased towards DTDs and tends to largely ignore XSDs (some notable exceptions nonwithstanding). One
Complexity of decision problems for XML schemas and chain regular expressions
 Siam J. Comp
"... Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Abstract. We study the complexity of the inclusion, equivalence, and intersection problem of extended CHAin Regular Expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. Though of a very simple from, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended contextfree grammars, and to singletype and restrained competition tree grammars. These grammars form abstractions of Document Type Definitions (DTDs), XML Schema definitions (XSDs) and the class of onepass preorder typeable XML schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant. 1. Introduction. Although
Regular Expressions with Counting: Weak versus Strong Determinism
"... Abstract. We study deterministic regular expressions extended with the counting operator. There exist two notions of determinism, strong and weak determinism, which almost coincide for standard regular expressions. This, however, changes dramatically in the presence of counting. In particular, we sh ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Abstract. We study deterministic regular expressions extended with the counting operator. There exist two notions of determinism, strong and weak determinism, which almost coincide for standard regular expressions. This, however, changes dramatically in the presence of counting. In particular, we show that weakly deterministic expressions with counting are exponentially more succinct and strictly more expressive than strongly deterministic ones, even though they still do not capture all regular languages. In addition, we present a finite automaton model with counters, study its properties and investigate the natural extension of the Glushkov construction translating expressions with counting into such counting automata. This translation yields a deterministic automaton if and only if the expression is strongly deterministic. These results then also allow to derive upper bounds for decision problems for strongly deterministic expressions with counting. 1
Efficient asymmetric inclusion between regular expression types
 In ICDT
, 2009
"... The inclusion of Regular Expressions (REs) is the kernel of any typechecking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACEcomplete for such extended ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The inclusion of Regular Expressions (REs) is the kernel of any typechecking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACEcomplete for such extended REs. In [9] we introduced a notion of “conflictfree REs”, which are extended REs with excellent complexity behaviour, including a cubic inclusion algorithm [9] and linear membership [10]. Conflictfree REs have interleaving and counting, but the complexity is tamed by the “conflictfree” limitations, which have been found to be satisfied by the vast majority of the content models published on the Web. However, a typechecking algorithm needs to compare ma chinegenerated subtypes against humandefined supertypes.
Checking Determinism of XML Schema Content Models in Optimal Time
"... We consider the determinism checking of XML Schema content models, as required by the W3C Recommendation. We argue that currently applied solutions have flaws and make processors vulnerable to exponential resource needs by pathological schemas, and we help to eliminate this potential vulnerability o ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the determinism checking of XML Schema content models, as required by the W3C Recommendation. We argue that currently applied solutions have flaws and make processors vulnerable to exponential resource needs by pathological schemas, and we help to eliminate this potential vulnerability of XML Schema based systems. XML Schema content models are essentially regular expressions extended with numeric occurrence indicators. A previously published polynomialtime solution to check the determinism of such expressions is improved to run in linear time, and the improved algorithm is implemented and evaluated experimentally. When compared to the corresponding method of a popular productionquality XML Schema processor, the new implementation runs orders of magnitude faster. Enhancing the solution to take further extensions of XML Schema into account without compromising its linear scalability is also discussed. Key words: Regular expression, numeric occurrence indicator, oneunambiguity, weak determinism, unique particle attribution, Java 1. Introduction and
Linear Time Membership for a Class of XML Types with Interleaving and Counting
"... Regular Expressions (REs) form the basis of most XML type languages, such as DTDs, XML Schema types, and XDuce types (Thompson et al. 2004; Hosoya and Pierce 2003). In this context, the interleaving operator would be a natural addition to the language of REs, as witnessed by the presence of limited ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Regular Expressions (REs) form the basis of most XML type languages, such as DTDs, XML Schema types, and XDuce types (Thompson et al. 2004; Hosoya and Pierce 2003). In this context, the interleaving operator would be a natural addition to the language of REs, as witnessed by the presence of limited forms of interleaving in XSD (the all group), RelaxNG, and SGML. Unfortunately, membership checking for REs with interleaving is NPhard in general. We present here a restricted class of REs with interleaving and counting which admits a linear membership algorithm. This restricted class is known to be expressive enough for the vast majority of the content models used in realworld DTDs and XSD schemas; moreover, we have proved in (Ghelli et al. 2007) that the same class admits a polynomial algorithm for subtyping and typeequivalence, problems which are EXPSPACEcomplete for the full language of REs with interleaving. We first present an algorithm for membership of a list of words into a RE with interleaving and counting, based on the translation of the RE into a set of constraints. We generalize the approach in order to check membership of XML trees into a class of EDTDs with interleaving and counting, which models the crucial aspects of DTDs and XSD schemas. Finally, we extend the approach to REs with intersection. 1.
Simplifying Regular Expressions: A Quantitative Perspective
, 2009
"... R e s e a r c h R e p o r t ..."