While plain pattern matching described in the previous section is
adequate for more complex program manipulation involving tree- or graph-like
data structures, higher level constructs such as rewriting and
inference are also available. In the rewriting formalism, equational
rules of the form lhs ->
rhs are specified by the user.
During processing, each instance of the lhs in a complex tree is replaced
by an instance of the rhs, until no such replacement is possible.
Equational rules can often be used to specify semantics based simplification
(e.g. constant folding and simplification based on simple algebraic
identities) or transformation(e.g. code selection in a compiler
backend[AGT89]).
Unlike plain pattern matching, however, the structural traversal process in rewriting is implicitly inferred from the type structure of an algebraic datatype, as specified in its definition. Thus when changing the structure of a datatype, unaffected patterns in rewriting rules do not have to be altered.
There are two main forms of rewriting modes available:
Rewrite class
Each independent set of rewriting rules in Prop is encapsulated in its own rewrite class. A rewrite class is basically a normal C++ class with a set of rewriting rules attached. During rewriting, the data members and the member functions are visible according to the normal C++ scoping rules. This makes it is easy to encapsulate additional data computed as a side effect during the rewriting process.
A rewriting example
Consider an abbreviated simplifier for the well-formed formula datatype
Wff
defined in the previous section. The rewrite class for this
can be defined as follows. Since there is no encapsulated data in this
example, only the default constructor for the class needs to be defined.
A rewrite class definition requires the rewriting protocol, which is
simply a list of datatypes involved in the rewriting traversal process,
to be specified. In this instance only Wff
is needed.
rewrite class Simplify (Wff) { public: Simplify() {} };
The rewrite rules for the simplifier can then be specified succinctly as
follows. Like the match
statement, in general the rhs
of a rewrite rule can be any statement. A special statement
rewrite(e)
can be used to rewrite the current redex into another form.
If the rhs is of the form rewrite(e)
, then it can be abbreviated
to e
, as in below:
rewrite Simplify { And(F, _): F | And(_, F): F | And(T, ?X): ?X | And(?X, T): ?X | Or (T, _): T | Or (_, T): T | Or (F, ?X): ?X | Or (?X, F): ?X | Not(Not(?X)): ?X | Not(And(?X,?Y)): Or(Not(?X), Not(?Y)) | Not(Or(?X,?Y)): And(Not(?X), Not(?Y)) | Implies(?X,?Y): Or(Not(?X), ?Y) | And (?X, ?X): ?X | Or (?X, ?X): ?X | Implies (?X, ?X): ?X // etc ... };
The rewrite class definition creates a new class of the same name. This new
class defines an implicit operator ()
with the protocol below. This
member function can be invoked to perform the rewriting in a functional
syntax.
class Simplify : ... { { ... public: void operator () (Wff); // Wff operator () (Wff); // if rewrite class is applicative }; Wff wff = ...; Simplify simplify; // create a new instance of the rewrite class simplify(wff); // rewrite the term wff
State caching
Replacements during rewriting often require state information to be recomputed to further the matching process. Since computation of state encoding can involve a complete traversal of a term, replacement can become expensive if the replacement term is large. For instance, consider the following replacement rule, which replaces all expressions of the form 2*x into x+x:
rewrite class StrengthReduction { MUL (INT 2, ?x): ADD(?x, ?x) ... }
Since the subterm ?x
could be arbitrarily large, recomputing the
state encoding for ADD(?x,?x)
takes time in proportion to the
size of ?x
. In order to speedup this replacement process,
state encoding caching can be enabled, which in the example above means
that the state encoding for ADD(?x,?x)
can be recomputed directly
from the state encoding of ?x
. State caching is enabled by
adding a rewrite
qualifiers in the definition of a datatype, as in:
datatype Exp :: rewrite = INT (int) | ID (const char *) | ADD (Exp, Exp) | SUB (Exp, Exp) | MUL (Exp, Exp) | DIV (Exp, Exp) ;
Conditional rewriting and actions
Rewriting rules may be guarded with predicates to limit their applicability. In addition, the rhs of a rewrite rule is not limited to only a replacement expression: in general, any arbitrarily complex sequence of code may be used. For example, in the following set of rewriting rules we use guards to prevent undesirable replacements to be made during expression constant folding:
rewrite class ConstantFolding { ADD (INT a, INT b): INT(a+b) | SUB (INT a, INT b): INT(a-b) | MUL (INT a, INT b): { int c = a * b; // silent overflow if (a == 0 || b == 0 || c / b == a) // no overflow? { rewrite(INT(c)); } else { cerr << "Overflow in multiply\n"; } } | DIV (INT a, INT b) | b == 0: { cerr << "Division by zero\n"; } | DIV (INT a, INT b): INT(a/b) | // etc... };
The rewrite statement
While the rewrite class construct provides a very general abstraction
for rewriting, in general its full power is unneeded. It is often
convenient to be able to perform rewriting on a term without having
to make a new name for a class just for the occasion, especially if member
functions and member data are unneeded. To accommodate these situations,
the rewrite
statement is provided
to perform a set rewriting transformations on a term without having
to define a temporary rewrite class. It is simply syntactic sugar
for the more general(but cumbersome) rewrite class and rewrite
rules specifications.
For example, a simplify routine for type Exp
defined above can be
specified as follows:
Exp simplify (Exp e) { // transformations on e before rewrite (e) type (Exp) { ADD (INT a, INT b): INT(a+b) | SUB (INT a, INT b): INT(a-b) | MUL (INT a, INT b): INT(a*b) | ... } // transformations on e after return e; }
The rewrite
normally performs the replacement in place.
An applicative version of the same can be written as follows:
Exp simplify (Exp e) { rewrite (e) => e type (Exp) { ADD (INT a, INT b): INT(a+b) | SUB (INT a, INT b): INT(a+b) | MUL (INT a, INT b): INT(a*b) | ... } return e; }
Confluence and termination
This section is under construction.
Commutivity and associativity
This section is under construction.