In this section we'll briefly describe some of the more interesting implementation techniques we have used and some of the interesting problems that we have encountered.
Data structure mapping
Pattern matching and rewriting
ML-style pattern matching is compiled by first constructing an automaton using an algorithm similar to that of []. The automaton is then translated into C++ code directly. Recently, Ramakrishnan et al.[] proposed an adaptive algorithm which, instead of using a fixed left-to-right matching order, derives an traversal order for each set of patterns. Unfortunately, this algorithm is co-NP for typed patterns, so instead we use a few heuristics proposed in the paper to select the traversal order.
Rewriting in Prop is compiled into bottom-up tree
automata[HO82]
with an index map compression heuristic[Cha87].
Large index maps are also compressed on a secondary level using
[]'s algorithm for trie compression. BURS-like tree
automata[] are generated for tree automata with fixed reduction cost,
while tree automata with runtime determined cost functions are compiled
into code that performs cost minimization using dynamic programming.
Table lookup optimizations are then performed by encoding compile
time determinable lookups into direct switch
statements.
For typical rewriting systems with less than 500 rules, the rewriting compiler is able to generate the tree automaton in less than 5 seconds. A simple benchmark of computing the 25th Fibonacci number using in place rewriting with a naive exponential algorithm takes 3.29 seconds (GC time .17 sec) of process time on a Sparc 5 workstation, including instrumentation and garbage collection overhead. This benchmark performs 1335317 matches and 364177 replacements. Since rewriting speed is largely independent of the size of the pattern sets this shows that rewriting in Prop can proceed at the speed of over 400,000 matches and over 110,000 replacements per second on typical machines.
Inference
Inference rules in Prop are compiled into a data flow network,
then flattened into a table form. During runtime an interpreter using
the RETE algorithm is invoked to process the network and dispatch to
various matching routines. Management of the and
memory
and token propagation is handled within the interpreter object, thus user
management code is not needed.
To speed up the pattern matching process, the inference rule compiler is responsible for performing several optimizations:
Garbage collection
We use a conservative garbage collector whose framework is based on the
work of Customisible Memory Management[AF]
in the PoSSe algebraic system. This scheme is in turns based on the work
on mostly-copying garbage collection by Bartlett[Bar88]
in the Scheme to C runtime system. Similar to CMM, garbage collectable
objects in Prop are all derived from the base class GCObject
. Each
subclass of GCObject
reimplements the virtual function
trace(GC *)
, which is responsible for providing type feedback by
traversing the pointers within the object and calling a garbage collector
method for each pointer found. Garbage collector objects are implemented as
visitors[GHJV95]. This object tracing method is generated
automatically for each user defined datatype by the datatype compiler.
The collectors can discover the current root set by scanning the stack, registers, static data and heap areas. Thus no root registration or inefficient smart pointer schemes[Ede92] are necessary: a pointer to a collectable object looks the same as any other pointers and can be placed inside machines registers or otherwise optimized by the compiler in the same manner.
By detaching the traversal method from the specific implementation of the garbage collectors, we are able to implement various garbage collection algorithms that can coexist with each other. Two main collection algorithms have been implemented:
Both collectors use the same abstract protocol to communicate with collectable objects, and because of late binding, the actual collection algorithm is determined at runtime. This makes it possible for the user to experiment with various collection algorithms without recompilation.
In addition, the following features have been implemented:
Some optimizations
We have implemented a few optimizations in the collector. The most important of these is a variant of blacklisting scheme proposed in [Boe93] for a mark-sweep style GC: when new memory is allocated, we discard all pages whose addresses may be misidentified as live data in the future. Unlike Boehm's scheme, however, we choose to blacklist an entire page (512 bytes) rather than a single object at a time in our scheme. This is necessary because keeping the blacklist in a granularity of an object makes compaction overly complicated. Of course, blacklisting on a page granularity may cause the system to unnecessarily leave many pages unused. However, we have not observed any degenerate effect due to this scheme in our applications. Furthermore, since unused pages are not dirtied they will not need to be swapped out by virtual memory. For extra efficiently, we plan to implement memory mapping/unmapping with mmap/munmap(or their equivalent) for OSes supporting these features.
Future optimizations
We also plan to implement a generation scheme, such as [Bar89]. However, since assignment is a common operation in C++ , the cost of implementing a write barrier may become prohibitively high. Furthermore, it is unclear how this can be implemented while maintaining compatibility with existing code. Further research is needed in this area.