State caching optimization

Next: Specifying secondary indices Up: The rewrite statement Previous: Rewriting modes

State caching optimization

The pattern matching process proceeds in a bottom up manner: for each subtree of the form l(t₁,t₂,Ľ,t_n), n ł 0, a state number is computed from the current tree label l and the state numbers of its subtrees t₁, t₂, Ľ, t_n. The set of matched rules can be directly computed from the current state number. This means that in the absence of redex replacements, pattern matching takes time proportional only O(|t| + r), where |t| is the size of the input tree t and r is time it takes to execute the rhs actions. That is, the runtime is insensitive to the number of rewrite rules or the complexity of the patterns.

However, if the input is a DAG rather than a tree⁹, or if replacements are performed, then the above analysis may not hold. Replacements during rewriting often require state information of the replacement term to be recomputed to further the matching process. Since computation of state can involve a complete traversal of a term, replacement can become expensive if the replacement term is large. For instance, consider the following replacement rule, which replaces all expressions of the form 2*x into x+x:

   rewrite class StrengthReduction
   {
      MUL (INT 2, ?x):  ADD(?x, ?x)
      ...
   }

Since the subterm ?x could be arbitrarily large, recomputing the state encoding for ADD(?x,?x) naively takes time in proportion to the size of ?x. However, since the subtree ?x occurs as part of the pattern, its state has already been computed, and it would be wasteful to recompute its state from scratch.

A similar observation can be made to the case when the input is a DAG, where many subcomponents are shared between nodes. In order to speed up the rewriting process, the state of each shared copy of the nodes should be computed only once.

In order to speedup the state computation process in these situations, can be enabled by specifying a rewriting index. This can be accomplished in two ways: (i) use the qualifier rewrite when defining a datatype. This specifies the primary index. Or alternatively, (ii) use index declarations to specify one or more secondary indices.

We'll first describe the first method. A rewrite qualifier can be used in the definition of a datatype. For instance:

   datatype Exp :: rewrite
      = INT (int)
      | ID  (const char *)
      | ADD (Exp, Exp)
      | SUB (Exp, Exp)
      | MUL (Exp, Exp)
      | DIV (Exp, Exp)
      ;

The rewrite qualifier tells the translator to generate an extra state field in the implementation of Exp. This state field is implicitly updated during the rewriting process. When the rewriter encounters a node with a cached state it can quickly short-circuit all unnecessary state computation.

A word of caution: since each rewriting system computes its own encoding, rewriting systems should not be mixed if they shard the same index, i.e. actions of a rewriting system should not invoke another rewriting system on a term that is participating in the current rewriting system, if both systems use the same index. This limitation can be overcome by using secondary indices, which we'll discuss next.

Next: Specifying secondary indices Up: The rewrite statement Previous: Rewriting modes

Allen Leung
Mon Apr 7 14:33:55 EDT 1997