Tree rewriting

Next: Inference Up: The Language Previous: Algebraic datatypes and

Tree rewriting

While plain pattern matching described in the previous section is adequate for more complex program manipulation involving tree- or graph-like data structures, higher level constructs such as rewriting and inference are also available. In the rewriting formalism, equational rules of the form lhs -> rhs are specified by the user. During processing, each instance of the lhs in a complex tree is replaced by an instance of the rhs, until no such replacement is possible. Equational rules can often be used to specify semantics based simplification (e.g. constant folding and simplification based on simple algebraic identities) or transformation(e.g. code selection in a compiler backend[AGT89]).

Unlike plain pattern matching, however, the structural traversal process in rewriting is implicitly inferred from the type structure of an algebraic datatype, as specified in its definition. Thus when changing the structure of a datatype, unaffected patterns in rewriting rules do not have to be altered.

There are two main forms of rewriting modes available:

The first is normalization mode: a given tree is reduced using the matching rules until no more redexes are available. There are two modes of operations available:
- in replacement mode, the redex of a tree will be physically overwritten.
- in applicative mode, on the other hand, a new tree corresponding to the replacement value will be constructed.
Replacement mode is used as the default since it is usually the more efficient of the two.
The second form is reduction and transformation. In this mode a tree parse of the input term is computed. If cost functions are attached to the rules, then they will also be used to determine a minimal cost reduction sequence. During this process attached actions of a rule may be invoked to synthesize new data.

Rewrite class

Each independent set of rewriting rules in Prop is encapsulated in its own rewrite class. A rewrite class is basically a normal C++ class with a set of rewriting rules attached. During rewriting, the data members and the member functions are visible according to the normal C++ scoping rules. This makes it is easy to encapsulate additional data computed as a side effect during the rewriting process.

A rewriting example

Consider an abbreviated simplifier for the well-formed formula datatype Wff defined in the previous section. The rewrite class for this can be defined as follows. Since there is no encapsulated data in this example, only the default constructor for the class needs to be defined. A rewrite class definition requires the rewriting protocol, which is simply a list of datatypes involved in the rewriting traversal process, to be specified. In this instance only Wff is needed.

   rewrite class Simplify (Wff)
   {
   public:
      Simplify() {}
   };

The rewrite rules for the simplifier can then be specified succinctly as follows. Like the match statement, in general the rhs of a rewrite rule can be any statement. A special statement rewrite(e) can be used to rewrite the current redex into another form. If the rhs is of the form rewrite(e), then it can be abbreviated to e, as in below:

   rewrite Simplify
   {  And(F,  _):      F
   |  And(_,  F):      F
   |  And(T,  ?X):     ?X
   |  And(?X, T):      ?X
   |  Or (T,  _):      T
   |  Or (_,  T):      T
   |  Or (F,  ?X):     ?X
   |  Or (?X, F):      ?X
   |  Not(Not(?X)):    ?X
   |  Not(And(?X,?Y)): Or(Not(?X), Not(?Y))
   |  Not(Or(?X,?Y)):  And(Not(?X), Not(?Y))
   |  Implies(?X,?Y):  Or(Not(?X), ?Y)
   |  And (?X, ?X):     ?X
   |  Or (?X, ?X):      ?X
   |  Implies (?X, ?X): ?X
   // etc ...
   };

The rewrite class definition creates a new class of the same name. This new class defines an implicit operator () with the protocol below. This member function can be invoked to perform the rewriting in a functional syntax.

   class Simplify : ... {
   {  ...
   public:
      void operator () (Wff);
      // Wff operator () (Wff); // if rewrite class is applicative
   };

   Wff wff = ...;
   Simplify simplify; // create a new instance of the rewrite class
   simplify(wff);    // rewrite the term wff

State caching

Replacements during rewriting often require state information to be recomputed to further the matching process. Since computation of state encoding can involve a complete traversal of a term, replacement can become expensive if the replacement term is large. For instance, consider the following replacement rule, which replaces all expressions of the form 2*x into x+x:

   rewrite class StrengthReduction
   {
      MUL (INT 2, ?x):  ADD(?x, ?x)
      ...
   }

Since the subterm ?x could be arbitrarily large, recomputing the state encoding for ADD(?x,?x) takes time in proportion to the size of ?x. In order to speedup this replacement process, state encoding caching can be enabled, which in the example above means that the state encoding for ADD(?x,?x) can be recomputed directly from the state encoding of ?x. State caching is enabled by adding a rewrite qualifiers in the definition of a datatype, as in:

   datatype Exp :: rewrite
      = INT (int)
      | ID  (const char *)
      | ADD (Exp, Exp)
      | SUB (Exp, Exp)
      | MUL (Exp, Exp)
      | DIV (Exp, Exp)
      ;

Conditional rewriting and actions

Rewriting rules may be guarded with predicates to limit their applicability. In addition, the rhs of a rewrite rule is not limited to only a replacement expression: in general, any arbitrarily complex sequence of code may be used. For example, in the following set of rewriting rules we use guards to prevent undesirable replacements to be made during expression constant folding:

   rewrite class ConstantFolding
   {  ADD (INT a, INT b):  INT(a+b)
   |  SUB (INT a, INT b):  INT(a-b)
   |  MUL (INT a, INT b):
      {  int c = a * b;                      // silent overflow
         if (a == 0 || b == 0 || c / b == a) // no overflow?
         {  rewrite(INT(c)); }
         else
         {  cerr << "Overflow in multiply\n"; }
      }
   |  DIV (INT a, INT b) | b == 0:  { cerr << "Division by zero\n"; }
   |  DIV (INT a, INT b):  INT(a/b)
   |  // etc...
   };

The rewrite statement

While the rewrite class construct provides a very general abstraction for rewriting, in general its full power is unneeded. It is often convenient to be able to perform rewriting on a term without having to make a new name for a class just for the occasion, especially if member functions and member data are unneeded. To accommodate these situations, the rewrite statement is provided to perform a set rewriting transformations on a term without having to define a temporary rewrite class. It is simply syntactic sugar for the more general(but cumbersome) rewrite class and rewrite rules specifications. For example, a simplify routine for type Exp defined above can be specified as follows:

   Exp simplify (Exp e)
   {  // transformations on e before
      rewrite (e) type (Exp)
      {  ADD (INT a, INT b):  INT(a+b)
      |  SUB (INT a, INT b):  INT(a-b)
      |  MUL (INT a, INT b):  INT(a*b)
      |  ...
      }
      // transformations on e after
      return e;
   }

The rewrite normally performs the replacement in place. An applicative version of the same can be written as follows:

   Exp simplify (Exp e)
   {  rewrite (e) => e type (Exp)
      {  ADD (INT a, INT b):  INT(a+b)
      |  SUB (INT a, INT b):  INT(a+b)
      |  MUL (INT a, INT b):  INT(a*b)
      |  ...
      }
      return e;
   }

Confluence and termination
This section is under construction.

Commutivity and associativity
This section is under construction.

Next: Inference Up: The Language Previous: Algebraic datatypes and

Allen Leung
Wed Mar 6 20:55:43 EST 1996