Thursday, June 28, 2007

Representing DSL expressions in Haskell When you want to embed a DSL in Haskell you will almost certainly be faced with having some kind of expressions in your DSL. If you are lucky you can get away with using the Haskell types as the DSL type (i.e., use the meta types as object types). E.g., if you DSL needs integer expressions you can use the Haskell type Integer. But sometimes you're not that lucky, so you need to represent the DSL expressions as a Haskell data type. What are the options for representing expressions in Haskell? A normal data type Let's start with the most obvious one, an ordinary data type with constructors for the different kinds of expressions. As an example I'll have a simple expression language that has constants (of type Int), addition, less-or-equal, and conditional.
data Exp
    = ConI Int
    | Add Exp Exp
    | LE Exp Exp
    | Cond Exp Exp Exp
    deriving (Show)
Just to exemplify, here's an evaluator for these expressions. We need a type of values that the evaluator can return as well.
data Val = ValI Int | ValB Bool
    deriving (Show)

eval :: Exp -> Val
eval (ConI i) = ValI i
eval (Add x y) = case (eval x, eval y) of
                 (ValI vx, ValI vy) -> ValI (vx + vy)
                 _ -> error "Bad arguments to Add"
eval (LE x y) = case (eval x, eval y) of
                (ValI vx, ValI vy) -> ValB (vx <= vy)
                _ -> error "Bad arguments to LE"
eval (Cond x y z) = case (eval x) of
                    ValB b -> if b then eval y else eval z
                    _ -> error "Bad arguments to Cond"
And a test run:
main = print $ eval e
  where e = Cond (LE (ConI 3) (ConI 5)) (Add (ConI 1) (ConI 2)) (ConI 0)
prints ValI 3 This is OK, but look at all the error cases in the evaluator. Remember that these are expressions in a DSL, so we are going to have expression fragments in our Haskell code. If we make a mistake, like Cond (ConI 1) (ConI 1) (ConI 1) this is not going to be caught by the Haskell type checker since everything is of type Exp. Instead it is going to cause some error when the program is run. Being advocates of static type checking (why else use Haskell?), this is rather disgusting. GADTs So let's try a new and shiny feature in GHC, namely GADTs. (GADTs are new in ghc, but the idea is old; it's been well known in constructive type theory for ages, Kent Petersson and I suggested it as an addition to Haskell almost 15 years ago.) With a GADT you can specify more precise types for the constructors.
data Exp a where
    ConI :: Int -> Exp Int
    Add  :: Exp Int -> Exp Int -> Exp Int
    LE   :: Exp Int -> Exp Int -> Exp Bool
    Cond :: Exp Bool -> Exp a -> Exp a -> Exp a
These are the types we want. It's now impossible to construct ill typed value of type Exp t; it will be caught by the Haskell type checker. The evaluator looks very neat and natural with GADTs
eval :: Exp a -> a
eval (ConI i) = i
eval (Add x y) = eval x + eval y
eval (LE x y) = eval x <= eval y
eval (Cond x y z) = if eval x then eval y else eval z
GADTs is really the way to go, but it has the disadvantage of not being standard Haskell. It can also get somewhat cumbersome when we have variables; the evaluator now needs a typed environment. It's certainly doable, but starting to look less nice. Phantom types Let's explore a third way that in some sense combines the previous two. The idea is to have an untyped representation, like Exp, but to use an abstract data type on top of it that only allows constructing well typed expressions. We start with the original type, but rename it to Exp'
data Exp'
    = ConI Int
    | Add Exp' Exp'
    | LE Exp' Exp'
    | Cond Exp' Exp' Exp'
    deriving (Show)
On top of this we provide the type the user will see.
newtype Exp a = E Exp'
    deriving (Show)

conI :: Int -> Exp Int
conI = E . ConI

add :: Exp Int -> Exp Int -> Exp Int
add (E x) (E y) = E $ Add x y

le :: Exp Int -> Exp Int -> Exp Bool
le (E x) (E y) = E $ LE x y

cond :: Exp Bool -> Exp a -> Exp a -> Exp a
cond (E x) (E y) (E z) = E $ Cond x y z
The functions conI, add, le, and cond are the only ones the DSL user will see. Note that they have the same types as the constructors in the GADT. To ensure the DSL user can only use these we need to put it all in a module and export the right things.
module Exp(Exp, conI, add, le, cond) where ...
We can write an evaluator again (inside the Exp module).
eval :: Exp a -> Val
eval (E x) = eval' x

eval' :: Exp' -> Val
eval' (ConI i) = ValI i
eval' (Add x y) = case (eval' x, eval' y) of
                 (ValI vx, ValI vy) -> ValI (vx + vy)
                 _ -> error "Bad arguments to Add"
eval' (LE x y) = case (eval' x, eval' y) of
                 (ValI vx, ValI vy) -> ValB (vx <= vy)
                 _ -> error "Bad arguments to LE"
eval' (Cond x y z) = case (eval' x) of
                    ValB b -> if b then eval' y else eval' z
                    _ -> error "Bad arguments to Cond"
Not as elegant as the GADT evaluator, but at least the error cases are impossible when expressions are constructed using the exported interface. And the test would now look like
main = print $ eval e
  where e = cond (le (conI 3) (conI 5)) (add (conI 1) (conI 2)) (conI 0)
Phantom types gets its name from the fact that the type variable in the definition of the Exp type doesn't occur anywhere in the constructors; so it's a phantom. Now and then people suggest implementing a dynamically typed version of Haskell. Phantom types is one of the things that makes such an implementation difficult. There are no values that carry the type information at runtime, the types only exist at compile time.

5 comments:

  1. Thanks for an interesting post. Is the fact that GADTs are not part of Haskell98 enough of a reason not to use them, in your opinion, when they are supported by the compilers available and add expressive power without taking anything away? Or are you simply contrasting the possible approaches?

    Cheers,

    ReplyDelete
  2. GADTs are only supported by ghc(jhc?), and if I can I try to avoid such features.
    As I said, I like them, but I'm still wary. :)

    ReplyDelete
  3. Hi Lennart. I'm glad to see you playing with expressions & code generation. Self-plug: have you read "Compiling embedded languages" (describing the Pan implementation)? It discusses some of the same issues, including the choice of phantom types, as well as expression optimization. Now I prefer GADTs, but only if & when they work properly with type classes, as necessary for the rewriting/optimization. The big wins for me with GADTs are (a) fewer levels of representation, and (b) statically typed (and thus more readily correct) rewrite rules. I'd also really like to write the rules in a friendly concrete syntax, but I don't know how to do that easily.

    I have a new version of Pan, called "Pajama", and I'd like to try out the GADT style (with statically typed rewrites). It generates Java now, but I'd be interested in another back-end that uses Harpy. Since you're playing with this sort of thing, I wonder if you would be interested in collaborating on the project. It'd be more fun & enticing for me to work on in that case. I'm actively working on a new Eros implementation that could also tie in to Harpy. The original motivation for Eros is as a Pan/Pajama replacement for non-programmers, using concrete rather than syntactic composition. None of these projects is at all specific to image synthesis. The systems/compilers are fully general.

    ReplyDelete
  4. Conal,
    I had seen you paper, but it was too long ago for me to remember. :)

    ReplyDelete
  5. What a lot of spam! Anyway, just thought I'd let you know I found this helpful and you've saved me from wasting a lot of time trying to get GADTs to do what I want.

    ReplyDelete