Results 1 -
4 of
4
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
, 2008
"... Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regu ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we will show. The regular expressions occurring in practical DTDs and XSDs, however, are such that every alphabet symbol occurs only a small number of times. As such, in practice it suffices to learn the subclass of regular expressions in which each alphabet symbol occurs at most k times, for some small k. We refer to such expressions as k-occurrence regular expressions (k-OREs for short). Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the one that best describes the sample based on a Minimum Description Length argument. The effectiveness of the method is empirically validated both on real world and synthetic data. Furthermore, the method is shown to be conservative over the simpler classes of expressions considered in previous work.
SchemaScope: a System for . . .
, 2008
"... We present SchemaScope, a system to derive Document Type Definitions and XML Schemas from corpora of sample XML documents. Tools are provided to visualize, clean, and refine existing or inferred schemas. A number of use cases illustrates the versatility of the system, as well as various types of app ..."
Abstract
- Add to MetaCart
We present SchemaScope, a system to derive Document Type Definitions and XML Schemas from corpora of sample XML documents. Tools are provided to visualize, clean, and refine existing or inferred schemas. A number of use cases illustrates the versatility of the system, as well as various types of applications.
Mapping YANG to Document Schema Definition Languages and Validating NETCONF Content
, 2011
"... ..."
Recognizing Matching Patterns for XML Data Using Grammar-based Data Compression Algorithm
"... XML is a standard format for data exchange and it is well suited to represent internet applications because of its textbased format. However, this flexibility means that it incurs higher data processing overhead than ordinary data formats. In this paper, we propose a high-performance XML processing ..."
Abstract
- Add to MetaCart
XML is a standard format for data exchange and it is well suited to represent internet applications because of its textbased format. However, this flexibility means that it incurs higher data processing overhead than ordinary data formats. In this paper, we propose a high-performance XML processing method using a novel pattern recognition algorithm based on a grammar compression algorithm. In the method, training XML documents are pre-analyzed in order to detect frequently appearing constructs in the document. The extended XML parser uses the results of the pre-analysis to make its parsing faster with speculative input matching. The results of experiments show that the proposed method improves the performance of XML parsing by up to 182% (146 % on average) compared with an ordinary SAX parser with namespace processing under the condition that the target XML documents are similar to the pre-analyzed XML documents. 1.

