Prop: a C++ based pattern matching language

Overview

Prop is a multi-paradigm extension of C++ that includes string matching, algebraic datatypes, Standard ML-style pattern matching, pretty printing, term and graph rewriting, inference, and simple persistence as built-in features. Prop is designed as a development language for interpreters, compilers, and language translation and transformation tools. It simplifies the construction of these systems by providing high level declarative and rule based formalisms on top of the traditional procedural and object-oriented paradigms of the base language.

We design Prop with two main objectives in mind: the first objective is to improve the productivity of programmers working in a mainstream imperative language--- especially in domains such as compilers and language implementation--- by introducing high level formalisms based on trees and attributed labeled graphs, tree and graph rewriting, and finite set theory. By providing the users with a wide array of high level data structures and program combining forms, we encourage the use of the appropriate level of abstraction in each component of a language processing system. Rather than restricted to a single imperative mode of thinking, users can utilize applicative, transformational, equational, deductive, imperative or even a combination of formalisms in a cooperative manner. For example, syntactic analysis can be performed with the parsing/lexical analysis constructs of Prop; semantic analysis with pattern matching, tree rewriting, and inference; optimization with the SETL-style sub-language and graph rewriting; and, finally, code generation and machine language level tools development with tree reduction with dynamic programming and bit-string pattern matching.

The second objective is high performance and portability. Programs written in Prop are lightweight and efficient: i.e. unneeded features are never included in the runtime system of a program, and those that are included are first transformed into interpretation free C++ code. All features are translated into C++ (using a source to source translator written in itself) using efficient algorithms. Maximal compatibility with the base language is maintained by mapping all high level data structures of directly into C++ classes. These classes can be used transparently as if they are written by the user. Thus we minimize the data impedance mismatch between different abstraction levels. A result of this is that high level programs written in Prop can readily utilize existing code and libraries with little change. An optional conservative garbage collector can also be linked into the runtime system for Prop programs that desire automatic memory reclamation.

Additional Information

These are some unfinished documents on the language.

Prop: a multi-paradigm pattern matching language(in HTML) (not completed). This is an introduction to the language. Alternative versions in Postscript and PDF are also available.
Language reference manual for Prop. This is the recommended source for the latest information concerning Prop. Postscript and PDF versions of above are also available.

Implementation

Currently the following features have been implemented:

String matcher, lexical analyzer, and parser generation. Currently the grammar is restricted to LALR(1) (with operator precedence).
ML-style algebraic datatypes and pattern matching.
Bottom-up tree rewriting (with conditionals).
Bottom-up tree parsing with dynamic programming for cost minimization.
Incremental forward chaining inference using a RETE-style algorithm.

Sample Programs

The following are some simple programs written in Prop version 2.3.0.

A program for simplifying propositional logical formulas using rewriting. This program includes a lexer, a parser, and a rewriter with about 20 rules. Since all these features are integrated in Prop , the AST constructed from the parser can be readily used by the pretty printer and the rewriter. There is no messy interface code to write. The program contains only 180 lines, including comments. Here's the generated code .
A program that demonstrates the inference formalism. Here's the generated code .
A program that uses the persistence streams. Here's the generated code .
Another program that demonstrates the rewriting feature by transforming the abstract syntax tree of a toy imperative language. This program includes a pretty printing and a rewriter with about 60 rules. Here's the generated code . Here's the output from running the program.
A program that demonstrates the feature of tree parsing with cost minimization. We use it to recognize expression idioms in a simple expression language. Here's the generated code .

Source Code

The source code is available in tar'ed and gzipped form. To compile the translator and the runtime support library, g++ 2.5.8, 2.6.3, or 2.7.x is required.

Here are the versions currently available. The latest non-alpha version is usually the most up-to-date and stable. The reference manual is included in all distributions.

version 2.3.2 (March 21th, 1997).
version 2.3.3 (March 30th, 1997).
version 2.3.4 (April 8th, 1997).
version 2.3.5 (June 18th, 1997). Solaris binary for 2.3.5.
version 2.3.6 (Nov 2nd, 1999). This version compiles under gcc 2.95.1

version 2.3.7 Spitfire (August 19th, 2005). GCC 3.4.3/4.0, MS VC++ 2005 Express
version 2.3.8 Mustang (September 30th, 2005).

SourceForge