Algebraic datatypes are specified using datatype
definitions,
which define the inductive structure of one of more types using
a tree-grammar like syntax.
When a datatype is declared,
the following operations are implicitly defined by the datatype compiler:
(1) the constructors for all the variants of a type;
(2) the identity test operator ==
, and the assignment
operator =
for this type; and
(3) the member functions needed to decompose a datatype value during
pattern matching.
We'll select the internals of a compiler for a simplified imperative language as the running example in this section. Suppose that in this language an expression is composed of identifiers, integer constants and the four arithmetic operators. Then the structure of the abstract syntax tree can be specified as follows:
datatype Exp = INT (int) | ID (const char *) | ADD (Exp, Exp) | SUB (Exp, Exp) | MUL (Exp, Exp) | DIV (Exp, Exp) ;
Abstract syntax of an expression such as a * b - 17
can be constructed directly in a prefix syntax, directly mirroring
that of the definition. The Prop datatype
compiler will automatically generate a C++ class hierarchy to represent
the variants of type Exp
. Datatype constructor
functions(not to be mistaken with C++'s class constructors) will
also be automatically generated using the same names as the variants.
Exp formula = ADD(MUL(ID("a"),ID("b")),INT(17));
Datatype values can be decomposed using the match
statement, which
can be seen as a generalization of 's switch
construct. Pattern
matching is a combination of conditional branching and value
binding. For example, a typical evaluation function for the type Exp
can be written as in the following example. Notice that each arm of
a case
is in fact a pattern(with optional variables)
mirroring the syntax of a datatype. The pattern variables(written
with the prefix ?
in the sequel) of a matching
arm is bound to the value of the matching value, which can be
subsequently referenced and modified in the action of an arm.
int eval (Exp e, const map<const char *, int>& env) { match (e) { case INT ?i: return ?i; case ID ?id: return env[?id]; case ADD (?e1,?e2): return eval(?e1,env) + eval(?e2,env); case SUB (?e1,?e2): return eval(?e1,env) - eval(?e2,env); case MUL (?e1,?e2): return eval(?e1,env) * eval(?e2,env); case DIV (?e1,?e2): return eval(?e1,env) / eval(?e2,env); } }