Results 1 - 10
of
43
The next 700 data description languages
- In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, 2006
"... governmental, scientific, and private data. Because they have been standardized and are widely used, many reliable, efficient, and convenient tools for processing data in these formats are readily available. For instance, your favorite programming language undoubtedly has libraries for parsing XML a ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
governmental, scientific, and private data. Because they have been standardized and are widely used, many reliable, efficient, and convenient tools for processing data in these formats are readily available. For instance, your favorite programming language undoubtedly has libraries for parsing XML and HTML as well as reading and transforming images in JPEG or movies in MPEG. Query engines are available for querying XML documents. Widely-used applications like Microsoft Word and Excel automatically translate documents between HTML and other standard formats. In short, life is good when working with standard data formats. In an ideal world, all data would be in such formats. In reality, however, we are not nearly so fortunate. An ad hoc data format is any non-standard data format. Typically, such formats do not have parsing, querying, analysis, or transformation tools readily available. Every day, network administrators, financial analysts, computer scientists, biologists, chemists, astronomers, and physicists deal with ad hoc data in a myriad of complex formats. Figure 1 gives a partial sense of the range and pervasiveness of such data. Since off-the-shelf tools for processing these ad hoc data formats do not exist or are not readily available, talented scientists, data analysts, and programmers must waste their time on low-level chores like parsing and format translation to extract the valuable information they need from their data.
PADS/ML: A Functional Data Description Language
, 2007
"... Massive amounts of useful data are stored and processed in ad hoc formats for which common tools like parsers, printers, query engines and format converters are not readily available. In this paper, we explain the design and implementation of PADS/ML, a new language and system that facilitates the g ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Massive amounts of useful data are stored and processed in ad hoc formats for which common tools like parsers, printers, query engines and format converters are not readily available. In this paper, we explain the design and implementation of PADS/ML, a new language and system that facilitates the generation of data processing tools for ad hoc formats. The PADS/ML design includes features such as dependent, polymorphic and recursive datatypes, which allow programmers to describe the syntax and semantics of ad hoc data in a concise, easy-to-read notation. The PADS/ML implementation compiles these descriptions into ML structures and functors that include types for parsed data, functions for parsing and printing, and auxiliary support for user-specified, format-dependent and format-independent tool generation.
OMeta: an Object-Oriented Language for Pattern-Matching
, 2007
"... This paper introduces OMeta, a new object-oriented language for pattern matching. OMeta is based on a variant of Parsing Expression Grammars (PEGs) [5]—a recognitionbased foundation for describing syntax—which we have extended to handle arbitrary kinds of data. We show that OMeta’s general-purpose p ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
This paper introduces OMeta, a new object-oriented language for pattern matching. OMeta is based on a variant of Parsing Expression Grammars (PEGs) [5]—a recognitionbased foundation for describing syntax—which we have extended to handle arbitrary kinds of data. We show that OMeta’s general-purpose pattern matching provides a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree transformers, all of which can be extended in interesting ways using familiar object-oriented mechanisms. This makes OMeta particularly well-suited as a medium for experimenting with new designs for programming languages and extensions to existing languages.
Practical packrat parsing
, 2004
"... A considerable number of research projects are exploring how to extend object-oriented programming languages such as Java with, for example, support for generics, multiple dispatch, or pattern matching. To keep up with these changes, language implementors need appropriate tools. In this context, eas ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A considerable number of research projects are exploring how to extend object-oriented programming languages such as Java with, for example, support for generics, multiple dispatch, or pattern matching. To keep up with these changes, language implementors need appropriate tools. In this context, easily extensible parser generators are especially important because parsing program sources is a necessary first step for any language processor, be it a compiler, syntaxhighlighting editor, or API documentation generator. Unfortunately, context-free grammars and the corresponding LR or LL parsers, while well understood and widely used, are also unnecessarily hard to extend. To address this lack of appropriate tools, we introduce Rats!, a parser generator for Java that supports easily modifiable grammars and avoids the complexities associated with altering LR or LL grammars. Our work builds on recent research on packrat parsers, which are recursive descent parsers that perform backtracking but also memoize all intermediate results (hence their name), thus ensuring linear-time performance. Our work makes this parsing technique, which has been developed in the context of functional programming languages, practical for object-oriented languages. Furthermore, our parser generator supports simpler grammar specifications and more convenient error reporting, while also producing better performing parsers through aggressive optimizations. In this paper, we motivate the need for more easily extensible parsers, describe our parser generator and its optimizations in detail, and present the results of our experimental evaluation. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—compilers, optimization, parsing, translator writing systems and compiler generators
T.: Packrat parsers can support left recursion
- In: Proc. PEPM, ACM
, 2008
"... Packrat parsing offers several advantages over other parsing techniques, such as the guarantee of linear parse times while supporting backtracking and unlimited look-ahead. Unfortunately, the limited support for left recursion in packrat parser implementations makes them difficult to use for a large ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Packrat parsing offers several advantages over other parsing techniques, such as the guarantee of linear parse times while supporting backtracking and unlimited look-ahead. Unfortunately, the limited support for left recursion in packrat parser implementations makes them difficult to use for a large class of grammars (Java’s, for example). This paper presents a modification to the memoization mechanism used by packrat parser implementations that makes it possible for them to support (even indirectly or mutually) left-recursive rules. While it is possible for a packrat parser with our modification to yield super-linear parse times for some left-recursive grammars, our experiments show that this is not the case for typical uses of left recursion. D.3.4 [Programming Lan-
Parsing expression grammar as a primitive recursive-descent parser with backtracking
- Fundamenta Informaticae
, 2007
"... Two recent developments in the field of formal languages are Parsing Expression Grammar (PEG) and packrat parsing. The PEG formalism is similar to BNF, but defines syntax in terms of recognizing strings, rather than constructing them. It is, in fact, precise specification of a backtracking recursive ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Two recent developments in the field of formal languages are Parsing Expression Grammar (PEG) and packrat parsing. The PEG formalism is similar to BNF, but defines syntax in terms of recognizing strings, rather than constructing them. It is, in fact, precise specification of a backtracking recursivedescent parser. Packrat parsing is a general method to handle backtracking in recursive-descent parsers. It ensures linear working time, at a huge memory cost. This paper reports an experiment that consisted of defining the syntax of Java 1.5 in PEG formalism, and literally transcribing the PEG definitions into parsing procedures (accidentally, also in Java). The resulting primitive parser shows an acceptable behavior, indicating that packrat parsing might be an overkill for practical languages. The exercise with defining the Java syntax suggests that more work is needed on PEG as a language specification tool. 1
Semantics and Algorithms for Data-dependent Grammars
"... Traditional parser generation technologies are incapable of handling the demands of modern programmers. In this paper, we present the design and theory of a new parsing engine, YAKKER, capable of handling the requirements of modern applications including (1) full scannerless context-free grammars wi ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Traditional parser generation technologies are incapable of handling the demands of modern programmers. In this paper, we present the design and theory of a new parsing engine, YAKKER, capable of handling the requirements of modern applications including (1) full scannerless context-free grammars with (2) regular expressions as right-hand sides for defining nonterminals. YAKKER also includes (3) facilities for binding variables to intermediate parse results and (4) using such bindings within arbitrary constraints to control parsing. These facilities allow the kind of data-dependent parsing commonly needed in systems applications, particularly those that operate over binary data. In addition, (5) nonterminals may be parameterized by arbitrary values, which gives the system good modularity and abstraction properties in the presence of data-dependent parsing. Finally, (6) legacy parsing libraries, such as sophisticated libraries for dates and times, may be directly incorporated into parser specifications. We illustrate the importance and utility of this rich format specification language by presenting its use on examples ranging from difficult programming language grammars to web server logs to binary data specification. We also show that our grammars have important compositionality properties and explain why such properties are important in modern applications such as automatic grammar induction. In terms of technical contributions, we provide a traditional high-level semantics for our new grammar formalization and show how to compile these grammars into nondeterministic automata. These automata are stack-based, somewhat like conventional pushdown automata, but are also equipped with environments to track data-dependent parsing state. We prove the correctness of our translation of data-dependent grammars into these new automata and then show how to implement the automata efficiently using a variation of Earley’s parsing algorithm. 1.
Systems need languages need systems
- In 2nd ECOOP Workshop on Programming Languages and Operating Systems
, 2005
"... Domain-specific language and compiler extensions can significantly reduce the complexity of systems, especially when written in C or C++. However, instead of the current variety of ad-hoc implementation strategies, which include preprocessor macros, C++ templates, and custom-built language processor ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Domain-specific language and compiler extensions can significantly reduce the complexity of systems, especially when written in C or C++. However, instead of the current variety of ad-hoc implementation strategies, which include preprocessor macros, C++ templates, and custom-built language processors, system builders need a uniform and general facility for realizing their own extensions. In this paper, we argue that macros can provide this solution, as they provide a concise specification of how to transform a program at compile time. We develop the requirements for a suitable macro facility, outline the structure of the corresponding macro processor, called xtc for eXTensible C, and explore the implications of starting to build it. We conclude that only a tighter integration between system and language efforts can help us achieve the benefits of language and compiler extensibility and cope with the complexity of modern systems. 1 Systems Need... The utilization of programming language and compiler technologies in operating and distributed systems certainly is not new [2]; notable examples include the Pilot [13], Oberon [16], and SPIN [3] operating systems as well as the Fox Project’s network protocol stack [4]. Yet the vast majority of system builders continues to rely on C (or C++) instead of fully type-safe or functional languages. When they need additional expressive power, they tend to add their own, domain-specific idioms to the base language by using a variety of ad-hoc implementation strategies, including preprocessor macros, C++ templates, or even custom-built language processors. However, the unfortunate result is that existing language and compiler extensions are mutually incompatible and that new extensions are hard to realize. To illustrate the power of domain-specific language and compiler extensions and the range of implementation strategies, consider libasync [10] for building eventdriven distributed services, Capriccio [15] for building multi-threaded servers, and MACEDON [14] for building overlay networks. Common to these projects is that they all build on C or C++, that runtime support is not sufficient, and that they require language or compiler support. At the same time, they differ significantly in
PINOCCHIO: Bringing Reflection to Life with First-Class Interpreters
"... To support development tools like debuggers, runtime systems need to provide a meta-programming interface to alter their semantics and access internal data. Reflective capabilities are typically fixed by the Virtual Machine (VM). Unanticipated reflective features must either be simulated by complex ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
To support development tools like debuggers, runtime systems need to provide a meta-programming interface to alter their semantics and access internal data. Reflective capabilities are typically fixed by the Virtual Machine (VM). Unanticipated reflective features must either be simulated by complex program transformations, or they require the development of a specially tailored VM. We propose a novel approach to behavioral reflection that eliminates the barrier between applications and the VM by manipulating an explicit tower of first-class interpreters. PINOCCHIO is a proof-ofconcept implementation of our approach which enables radical changes to the interpretation of programs by explicitly instantiating subclasses of the base interpreter. We illustrate the design of PINOCCHIO through non-trivial examples that extend runtime semantics to support debugging, parallel debugging, and back-in-time object-flow debugging. Although performance is not yet addressed, we also discuss numerous opportunities for optimization, which we believe will lead to a practical approach to behavioral reflection.
Abstract Executable Grammars in Newspeak
, 2007
"... We describe the design and implementation of a parser combinator library in Newspeak, a new language in the Smalltalk family. Parsers written using our library are remarkably similar to BNF; they are almost entirely free of solutionspace (i.e., programming language) artifacts. Our system allows the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe the design and implementation of a parser combinator library in Newspeak, a new language in the Smalltalk family. Parsers written using our library are remarkably similar to BNF; they are almost entirely free of solutionspace (i.e., programming language) artifacts. Our system allows the grammar to be specified as a separate class or mixin, independent of tools that rely upon it such as parsers, syntax colorizers etc. Thus, our grammars serve as a shared executable specification for a variety of language processing tools. This motivates our use of the term executable grammar. We discuss the language features that enable these pleasing results, and, in contrast, the challenge our library poses for static type systems. 1

