Results 1 - 10
of
41
Disambiguation Filters for Scannerless Generalized LR Parsers
- Compiler Construction (CC’02
, 2002
"... Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interfa ..."
Abstract
-
Cited by 68 (13 self)
- Add to MetaCart
Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interface. In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Disambiguation constructs can be interpreted as filters on parse forests. Depending on the kind of disambiguation, filters can be applied at parser generation time, at parse time, or after parsing. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.
Concrete syntax for objects. Domain-specific language embedding and assimilation without restrictions
- Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA’04
, 2004
"... Application programmer’s interfaces give access to domain knowledge encapsulated in class libraries without providing the appropriate notation for expressing domain composition. Since object-oriented languages are designed for extensibility and reuse, the language constructs are often sufficient for ..."
Abstract
-
Cited by 59 (15 self)
- Add to MetaCart
Application programmer’s interfaces give access to domain knowledge encapsulated in class libraries without providing the appropriate notation for expressing domain composition. Since object-oriented languages are designed for extensibility and reuse, the language constructs are often sufficient for expressing domain abstractions at the semantic level. However, they do not provide the right abstractions at the syntactic level. In this paper we describe MetaBorg, a method for providing concrete syntax for domain abstractions to application programmers. The method consists of embedding domain-specific languages in a general purpose host language and assimilating the embedded domain code into the surrounding host code. Instead of extending the implementation of the host language, the assimilation phase implements domain abstractions in terms of existing APIs leaving the host language undisturbed. Indeed, Meta-Borg can be considered a method for promoting APIs to the language level. The method is supported by proven and available technology, i.e. the syntax definition formalism SDF and the program transformation language and toolset Stratego/XT. We illustrate the method with applications in three domains: code generation, XML generation, and user-interface construction.
Building Documentation Generators
- In Proceedings; IEEE International Conference on Software Maintenance
, 1999
"... In order to maintain the consistency between sources and documentation, while at the same time providing documentation at the design level, it is necessary to generate documentation from sources in such a way that it can be integrated with hand-written documentation. In order to simplify the const ..."
Abstract
-
Cited by 54 (18 self)
- Add to MetaCart
In order to maintain the consistency between sources and documentation, while at the same time providing documentation at the design level, it is necessary to generate documentation from sources in such a way that it can be integrated with hand-written documentation. In order to simplify the construction of documentation generators, we introduce island grammars, which only define those syntactic structures needed for (re)documentation purposes. We explain how they can be used to obtain various forms of documentation, such as data dependency diagrams for mainframe batch jobs. Moreover, we discuss how the derived information can be made available via a hypertext structure. We conclude with an industrial case study in which a 600,000 LOC COBOL legacy system is redocumented using the techniques presented in the paper. 1991 ACM Computing Classification System: D.2.2, D.2.5, D.2.7, D.3.4 Keywords and Phrases: Redocumentation, legacy systems, documentation generation, source code ana...
Current Parsing Techniques in Software Renovation Considered Harmful
- Proceedings of the Sixth International Workshop on Program Comprehension
, 1998
"... We evaluate the parsing technology used by people working in the reengineering industry. We discuss parser generators and complete systems like Yacc, TXL, TAMPR, REFINE, CobolTransformer, COSMOS, and ASF+SDF. We explain the merits and drawbacks of the various techniques. We conclude that current tec ..."
Abstract
-
Cited by 52 (15 self)
- Add to MetaCart
We evaluate the parsing technology used by people working in the reengineering industry. We discuss parser generators and complete systems like Yacc, TXL, TAMPR, REFINE, CobolTransformer, COSMOS, and ASF+SDF. We explain the merits and drawbacks of the various techniques. We conclude that current technology may cause problems for the reengineering industry and that modular and/or compositional parsing techniques are a possible solution. Categories and Subject Description: D.2.6 [Software Engineering ]: Programming Environments---Interactive; D.2.7 [Software Engineering]: Distribution and Maintenance--- Restructuring; D.3.4. [Processors]: Parsing. Additional Key Words and Phrases: Reengineering, System renovation, Parsing, Generalized LR parsing, compositional grammars, modular grammars. 1 Introduction A hardly controversial statement in the reengineering community is that in order to reengineer software it is convenient to parse it. Maybe due to the overall agreement on this issue, ...
The TXL Source Transformation Language
, 2005
"... TXL is a special-purpose programming language designed for creating, manipulating and rapidly prototyping language descriptions, tools and applications. TXL is designed to allow explicit programmer control over the interpretation, application, order and backtracking of both parsing and rewriting rul ..."
Abstract
-
Cited by 47 (15 self)
- Add to MetaCart
TXL is a special-purpose programming language designed for creating, manipulating and rapidly prototyping language descriptions, tools and applications. TXL is designed to allow explicit programmer control over the interpretation, application, order and backtracking of both parsing and rewriting rules. Using first order functional programming at the higher level and term rewriting at the lower level, TXL provides for flexible programming of traversals, guards, scope of application and parameterized context. This flexibility has allowed TXL users to express and experiment with both new ideas in parsing, such as robust, island and agile parsing, and new paradigms in rewriting, such as XML markup, rewriting strategies and contextualized rules, without any change to TXL itself. This paper outlines the history, evolution and concepts of TXL with emphasis on its distinctive style and philosophy, and gives examples of its use in expressing and applying recent new paradigms in language processing.
Packrat Parsing: Simple, Powerful, Lazy, Linear Time
"... Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees linear parse time. Any language defined by an LL(k) ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees linear parse time. Any language defined by an LL(k) or LR(k) grammar can be recognized by a packrat parser, in addition to many languages that conventional linear-time algorithms do not support. This additional power simplifies the handling of common syntactic idioms such as the widespread but troublesome longest-match rule, enables the use of sophisticated disambiguation strategies such as syntactic and semantic predicates, provides better grammar composition properties, and allows lexical analysis to be integrated seamlessly into parsing. Yet despite its power, packrat parsing shares the same simplicity and elegance as recursive descent parsing; in fact converting a backtracking recursive descent parser into a linear-time packrat parser often involves only a fairly straightforward structural change. This paper describes packrat parsing informally with emphasis on its use in practical applications, and explores its advantages and disadvantages with respect to the more conventional alternatives.
Semi-automatic Grammar Recovery
- SOFTWARE—PRACTICE & EXPERIENCE
, 2001
"... We proposed a new approach for the construction of grammars and parsers for existing languages. The approach is both very powerful and simple. We provided a structured process and explained our methods in detail so that others can apply our ideas for their own grammar construction activities. We ill ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
We proposed a new approach for the construction of grammars and parsers for existing languages. The approach is both very powerful and simple. We provided a structured process and explained our methods in detail so that others can apply our ideas for their own grammar construction activities. We illustrated the proposed approach with a nontrivial case study. Using our process, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. We not only constructed a parser for it, but also published a web-enabled grammar specification so that others can use this result to conveniently construct their own grammar-based tools for VS COBOL II, or derivatives.
Incremental Analysis of Real Programming Languages
- In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation
, 1997
"... A major research goal for compilers and environments is the automatic derivation of tools from formal specifications. However, the formal model of the language is often inadequate; in particular, LR(k) grammars are unable to describe the natural syntax of many languages, such as C ++ and Fortran, w ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
A major research goal for compilers and environments is the automatic derivation of tools from formal specifications. However, the formal model of the language is often inadequate; in particular, LR(k) grammars are unable to describe the natural syntax of many languages, such as C ++ and Fortran, which are inherently non-deterministic. Designers of batch compilers work around such limitations by combining generated components with ad hoc techniques (for instance, performing partial type and scope analysis in tandem with parsing). Unfortunately, the complexity of incremental systems precludes the use of batch solutions. The inability to generate incremental tools for important languages inhibits the widespread use of language-rich interactive environments. We address this problem by extending the language model itself, introducing a program representation based on parse dags that is suitable for both batch and incremental analysis. Ambiguities unresolved by one stage are retained in this representation until further stages can complete the analysis, even if the resolution depends on further actions by the user. Representing ambiguity explicitly increases the number and variety of languages that can be analyzed incrementally using existing methods. To create this representation, we have developed an efficient incremental parser for general context-free grammars. Our algorithm combines Tomita's generalized LR parser with reuse of entire subtrees via state-matching. Disambiguation can occur statically, during or after parsing, or during semantic analysis (using existing incremental techniques); program errors that preclude disambiguation retain multiple interpretations indefinitely. Our representation and analyses gain efficiency by exploiting the local nature of ambigu...
Warm Fusion in Stratego: A Case Study in Generation of Program Transformation Systems
, 2000
"... Stratego is a domain-specic language for the specication of program transformation systems. The design of Stratego is based on the paradigm of rewriting strategies: user-denable programs in a little language of strategy operators determine where and in what order transformation rules are (automat ..."
Abstract
-
Cited by 22 (13 self)
- Add to MetaCart
Stratego is a domain-specic language for the specication of program transformation systems. The design of Stratego is based on the paradigm of rewriting strategies: user-denable programs in a little language of strategy operators determine where and in what order transformation rules are (automatically) applied to a program. The separation of rules and strategies supports modularity of specications. Stratego also provides generic features for specication of program traversals. In this paper we present a case study of Stratego as applied to a non-trivial problem in program transformation. We demonstrate the use of Stratego in eliminating intermediate data structures from (also known as deforesting) functional programs via the warm fusion algorithm of Launchbury and Sheard. This algorithm has been specied in Stratego and embedded in a fully automatic transformation system for kernel Haskell. The entire system consists of about 2600 lines of specication code, which bre...

