A quick tour of pattern matching

Next: Pattern matching versus Up: Algebraic Datatypes and Previous: Algebraic Datatypes and

A quick tour of pattern matching

Algebraic datatypes are specified using datatype definitions, which define the inductive structure of one of more types using a tree-grammar like syntax. When a datatype is declared, the following operations are implicitly defined by the datatype compiler: (1) the constructors for all the variants of a type; (2) the identity test operator ==, and the assignment operator = for this type; and (3) the member functions needed to decompose a datatype value during pattern matching.

We'll select the internals of a compiler for a simplified imperative language as the running example in this section. Suppose that in this language an expression is composed of identifiers, integer constants and the four arithmetic operators. Then the structure of the abstract syntax tree can be specified as follows:

   datatype Exp = INT (int)
                | ID  (const char *)
                | ADD (Exp, Exp)
                | SUB (Exp, Exp)
                | MUL (Exp, Exp)
                | DIV (Exp, Exp)
                ;

Abstract syntax of an expression such as a * b - 17 can be constructed directly in a prefix syntax, directly mirroring that of the definition. The Prop datatype compiler will automatically generate a C++ class hierarchy to represent the variants of type Exp. Datatype constructor functions(not to be mistaken with C++'s class constructors) will also be automatically generated using the same names as the variants.

   Exp formula = ADD(MUL(ID("a"),ID("b")),INT(17));

Datatype values can be decomposed using the match statement, which can be seen as a generalization of 's switch construct. Pattern matching is a combination of conditional branching and value binding. For example, a typical evaluation function for the type Exp can be written as in the following example. Notice that each arm of a case is in fact a pattern(with optional variables) mirroring the syntax of a datatype. The pattern variables(written with the prefix ? in the sequel) of a matching arm is bound to the value of the matching value, which can be subsequently referenced and modified in the action of an arm.

int eval (Exp e, const map<const char *, int>& env)
{  match (e)
   {  case INT ?i:        return ?i;
      case ID  ?id:       return env[?id];
      case ADD (?e1,?e2): return eval(?e1,env) + eval(?e2,env);
      case SUB (?e1,?e2): return eval(?e1,env) - eval(?e2,env);
      case MUL (?e1,?e2): return eval(?e1,env) * eval(?e2,env);
      case DIV (?e1,?e2): return eval(?e1,env) / eval(?e2,env);
   }
}

Next: Pattern matching versus Up: Algebraic Datatypes and Previous: Algebraic Datatypes and

Allen Leung
Mon Apr 7 14:33:55 EDT 1997