Next: Conclusion Up: Prop : A Previous: Paradigm composition

Implementation

In this section we'll briefly describe some of the more interesting implementation techniques we have used and some of the interesting problems that we have encountered.

Data structure mapping

Pattern matching and rewriting

ML-style pattern matching is compiled by first constructing an automaton using an algorithm similar to that of []. The automaton is then translated into C++ code directly. Recently, Ramakrishnan et al.[] proposed an adaptive algorithm which, instead of using a fixed left-to-right matching order, derives an traversal order for each set of patterns. Unfortunately, this algorithm is co-NP for typed patterns, so instead we use a few heuristics proposed in the paper to select the traversal order.

Rewriting in Prop is compiled into bottom-up tree automata[HO82] with an index map compression heuristic[Cha87]. Large index maps are also compressed on a secondary level using []'s algorithm for trie compression. BURS-like tree automata[] are generated for tree automata with fixed reduction cost, while tree automata with runtime determined cost functions are compiled into code that performs cost minimization using dynamic programming. Table lookup optimizations are then performed by encoding compile time determinable lookups into direct switch statements.

For typical rewriting systems with less than 500 rules, the rewriting compiler is able to generate the tree automaton in less than 5 seconds. A simple benchmark of computing the 25th Fibonacci number using in place rewriting with a naive exponential algorithm takes 3.29 seconds (GC time .17 sec) of process time on a Sparc 5 workstation, including instrumentation and garbage collection overhead. This benchmark performs 1335317 matches and 364177 replacements. Since rewriting speed is largely independent of the size of the pattern sets this shows that rewriting in Prop can proceed at the speed of over 400,000 matches and over 110,000 replacements per second on typical machines.

Inference

Inference rules in Prop are compiled into a data flow network, then flattened into a table form. During runtime an interpreter using the RETE algorithm is invoked to process the network and dispatch to various matching routines. Management of the and memory and token propagation is handled within the interpreter object, thus user management code is not needed.

To speed up the pattern matching process, the inference rule compiler is responsible for performing several optimizations:

Partition the antecedents of the rules into selects(single object tests) and joins(predicates involving multiple objects). Complex predicates involving multiple clauses are decomposed into simple tests when possible. Antecedents involving tree patterns are then compiled by invoking the pattern matching compiler.
Simple syntactic based transformations are performed; these include transforming boolean expressions into conjunctive form and pushing down selects.

Garbage collection

We use a conservative garbage collector whose framework is based on the work of Customisible Memory Management[AF] in the PoSSe algebraic system. This scheme is in turns based on the work on mostly-copying garbage collection by Bartlett[Bar88] in the Scheme to C runtime system. Similar to CMM, garbage collectable objects in Prop are all derived from the base class GCObject. Each subclass of GCObject reimplements the virtual function trace(GC *), which is responsible for providing type feedback by traversing the pointers within the object and calling a garbage collector method for each pointer found. Garbage collector objects are implemented as visitors[GHJV95]. This object tracing method is generated automatically for each user defined datatype by the datatype compiler.

The collectors can discover the current root set by scanning the stack, registers, static data and heap areas. Thus no root registration or inefficient smart pointer schemes[Ede92] are necessary: a pointer to a collectable object looks the same as any other pointers and can be placed inside machines registers or otherwise optimized by the compiler in the same manner.

By detaching the traversal method from the specific implementation of the garbage collectors, we are able to implement various garbage collection algorithms that can coexist with each other. Two main collection algorithms have been implemented:

A mostly-copying algorithm based on [Bar88].
A non-moving mark-sweep style collector.

The copying collector is used as the default in Prop since, unlike C or C++ programs, Prop programs written in an applicative style typically generate more short term objects than an equivalent C or C++ program.

Both collectors use the same abstract protocol to communicate with collectable objects, and because of late binding, the actual collection algorithm is determined at runtime. This makes it possible for the user to experiment with various collection algorithms without recompilation.

In addition, the following features have been implemented:

Interior pointers. Pointers can point to the interior of an object. Due to pointer arithmetic in C and multiple inheritance and upcasting in C++ , recognizing interior pointers is necessary for collection safety.
Multiple heaps and cross heap pointers. Multiple collectable and non-collectable heaps can coexist at runtime. This makes it possible to, say, use a copying heap to store terms created by rewriting(which are typically short lived) while using the mark-sweep collector to storage objects with longer life times(in which case it is less expensive to simply to mark the object during collection than moving it to another space). Furthermore, pointers from an object can refer to objects from other heaps.
Finalization. Each collectable heap can be declared to be finalizable or non-finalizable. The C++ destructors are called for each collectable object when its storage is reclaimed And finally,
Weak pointers to collectable heaps.

Some optimizations

We have implemented a few optimizations in the collector. The most important of these is a variant of blacklisting scheme proposed in [Boe93] for a mark-sweep style GC: when new memory is allocated, we discard all pages whose addresses may be misidentified as live data in the future. Unlike Boehm's scheme, however, we choose to blacklist an entire page (512 bytes) rather than a single object at a time in our scheme. This is necessary because keeping the blacklist in a granularity of an object makes compaction overly complicated. Of course, blacklisting on a page granularity may cause the system to unnecessarily leave many pages unused. However, we have not observed any degenerate effect due to this scheme in our applications. Furthermore, since unused pages are not dirtied they will not need to be swapped out by virtual memory. For extra efficiently, we plan to implement memory mapping/unmapping with mmap/munmap(or their equivalent) for OSes supporting these features.

Future optimizations

We also plan to implement a generation scheme, such as [Bar89]. However, since assignment is a common operation in C++ , the cost of implementing a write barrier may become prohibitively high. Furthermore, it is unclear how this can be implemented while maintaining compatibility with existing code. Further research is needed in this area.

Next: Conclusion Up: Prop : A Previous: Paradigm composition

Allen Leung
Wed Mar 6 20:55:43 EST 1996