Prop: a C++ based pattern matching language
Overview
Prop is a multi-paradigm extension of
C++ that includes string matching, algebraic datatypes,
Standard ML-style pattern
matching, pretty printing, term and graph rewriting,
inference, and simple persistence
as built-in features. Prop is designed as a
development language for interpreters, compilers, and language
translation and transformation tools.
It simplifies the construction of these systems by providing high
level declarative and rule based formalisms on top of the traditional
procedural and object-oriented paradigms of the base language.
We design Prop with two main objectives in mind: the first objective
is to improve the productivity of programmers working in a mainstream
imperative language--- especially in domains such as compilers and
language implementation--- by introducing high level formalisms based on
trees and attributed labeled graphs, tree and graph rewriting, and
finite set theory.
By providing the users with a wide array of high level
data structures and program combining forms, we encourage the use of
the appropriate level of abstraction in each component of a language
processing system. Rather than restricted to a single imperative
mode of thinking, users can utilize applicative, transformational,
equational, deductive, imperative or even a combination of formalisms
in a cooperative manner. For example, syntactic analysis can be performed
with the parsing/lexical analysis constructs of Prop; semantic
analysis with pattern matching, tree rewriting, and inference;
optimization with the SETL-style
sub-language and graph rewriting; and, finally, code generation
and machine language level tools development with tree reduction with
dynamic programming and bit-string pattern matching.
The second objective is high performance and portability.
Programs written in Prop are lightweight and efficient: i.e.
unneeded features are never included in the runtime system of a program,
and those that are included are first transformed into interpretation
free C++ code. All features are translated into C++
(using a source to source translator written in itself)
using efficient algorithms. Maximal compatibility with the base language
is maintained by mapping all high level data structures of
directly into C++ classes. These classes can be used transparently
as if they are written by the user. Thus we minimize the
data impedance mismatch between different abstraction levels.
A result of this is that high level programs written in Prop
can readily utilize existing code and libraries with little change.
An optional conservative garbage collector
can also be linked into the runtime system for Prop programs that
desire automatic memory reclamation.
Additional Information
These are some unfinished documents on the language.
Implementation
Currently the following features have been implemented:
- String matcher, lexical analyzer, and parser generation.
Currently the grammar is restricted to LALR(1) (with operator precedence).
- ML-style algebraic datatypes and pattern matching.
- Bottom-up tree rewriting (with conditionals).
- Bottom-up tree parsing with dynamic programming for cost minimization.
- Incremental forward chaining inference using a RETE-style algorithm.
Sample Programs
The following are some simple programs written in Prop version 2.3.0.
- A program
for simplifying propositional logical formulas using
rewriting. This program includes a lexer, a parser, and
a rewriter with about 20 rules. Since all these features are integrated
in Prop , the AST constructed
from the parser can be readily used by the pretty printer
and the rewriter. There is no messy interface code to write.
The program contains only 180 lines, including comments.
Here's the generated code .
- A program that demonstrates
the inference formalism.
Here's the generated code .
- A program that uses the
persistence streams.
Here's the generated code .
- Another program that demonstrates
the rewriting feature by transforming the abstract syntax tree of a
toy imperative language. This program includes a pretty printing
and a rewriter with about 60 rules.
Here's the generated code .
Here's the output from running
the program.
- A program that demonstrates the feature of
tree parsing with cost minimization. We use it to recognize
expression idioms in a simple expression language.
Here's the generated code .
Source Code
The source code is available in
tar'ed and gzipped form. To compile the translator
and the runtime support library, g++ 2.5.8, 2.6.3, or 2.7.x is
required.
Here are the versions currently available. The latest
non-alpha version is usually the most up-to-date and stable.
The reference manual is included in all distributions.
SourceForge