Wednesday, June 10, 2009

More LLVM

Recently someone asked me on #haskell if you could use the Haskell LLVM bindings to compile some abstract syntax to a Haskell function. Naturally I said yes, but then I realized I had only done it for a boring language with just one type. I had no doubt that it could be done for more complicated languages with multiple types, but it might not be totally obvious how. So I decided to write a simple compiler, and this blog post is the result. First, a simple example:
main = do
        let f :: Double -> Double
            Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"
        print (f 2, f 3, f 0)
Running this program produces (as expected)
(0.25,0.1111111111111111,0.0)
What has happened is that the string has been parsed to an abstract syntax tree, translated into LLVM code, then to machine code, and finally turned back into a Haskell callable function. Many things can go wrong along the way, like syntax and type errors, so compile returns a Maybe type to indicate if things went right or wrong. (A more serious version of the compile function would return an error message when something has gone wrong.) The definition of the compilation function is simple and illustrates the flow of the compiler
compile :: (Translate a) => String -> Maybe a
compile = fmap translate . toTFun <=< mParseUFun
The context Translate is there to limit the types that can actually be translated; it's a necessary evil and exactly what types are allowed depends on how advanced we make the compiler. Had we ignored the Maybe type the definitions would have been
compile = translate . toTFun . mParseUFun
which says, first parse to the type UFun (untyped expressions), then type check and turn it into the type TFun a, and finally translate TFun a into an a by LLVM compilation. Let's see how this all works.

The UExp module

The first step is to just define an abstract syntax for the expressions that we want to handle. I'm only allowing leading lambdas (this a very first order language), so there's a distiction between the top level UFun type and the expression type UExp. The U prefix indicates that this version of the syntax is not yet type checked. The definition is pretty boring, but here it is:
{-# OPTIONS_GHC -fno-warn-unused-binds #-}
{-# LANGUAGE RecordWildCards #-}
module UExp(Id, UFun(..), UTyp(..), UExp(..), parseUFun, showOp, mParseUFun) where
import Data.Maybe
import Data.List
import Data.Function
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language

type Id = String

data UFun = UFun [(Id, UTyp)] UExp

data UTyp = UTBol | UTDbl

data UExp
    = UDbl Double        -- ^ Double literal
    | UBol Bool          -- ^ Bool literal
    | UVar Id            -- ^ Variable
    | UApp Id [UExp]     -- ^ Function application
    | ULet Id UExp UExp  -- ^ Local binding
Naturally, we want to be able to show the expressions, if nothing else so for debugging. So I make a Show instance that shows them in a nice way respecting operator precedences etc. There's nothing exciting going on, the large number of lines is just to cover operator printing.
instance Show UFun where
    showsPrec p (UFun [] e) = showsPrec p e
    showsPrec p (UFun vts e) = showParen (p>0) (showString "\\ " . foldr (.) (showString "-> ") (map f vts) . showsPrec 0 e)
      where f (v, t) = showParen True (showString v . showString " :: " . showsPrec 0 t) . showString " "

instance Show UTyp where
    showsPrec _ UTDbl = showString "Double"
    showsPrec _ UTBol = showString "Bool"

instance Show UExp where
    showsPrec p (UDbl d) = showsPrec p d
    showsPrec p (UBol b) = showsPrec p b
    showsPrec _ (UVar i) = showString i
    showsPrec p (UApp "if" [c, t, e]) =
      showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e)
    showsPrec p (UApp op [a, b]) = showOp p op a b
    showsPrec _ (UApp op _) = error $ "Uxp.show " ++ op
    showsPrec p (ULet i e b) =
      showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b)

showOp :: (Show a, Show b) => Int -> String -> a -> b -> String -> String
showOp q sop a b = showParen (q>mp) (showsPrec lp a . showString sop . showsPrec rp b)
  where (lp,mp,rp) = case lookup sop ops of
                    Just (p, AssocLeft)  -> (p,   p, p+1)
                    Just (p, AssocRight) -> (p+1, p, p)
                    Just (p, AssocNone)  -> (p+1, p, p+1)
                    Nothing              -> (9,   9, 10)

ops :: [(String, (Int, Assoc))]
ops = [("+",  (6, AssocLeft)),
       ("-",  (6, AssocLeft)),
       ("*",  (7, AssocLeft)),
       ("/",  (7, AssocLeft)),
       ("==", (4, AssocNone)),
       ("<=", (4, AssocNone)),
       ("&&", (3, AssocRight)),
       ("||", (2, AssocRight))
      ]
We also want to be able to parse, so I'm using Parsec to parse the string and produce an AST. Again, there's nothing interesting going on. I use the Haskell lexical analysis provided by Parsec. This is available as a TokenParser record, which can be conveniently opened with the RecordWildcard notation TokenParser{..}.
parseUFun :: SourceName -> String -> Either ParseError UFun
parseUFun = parse $ do f <- pFun; eof; return f
  where TokenParser{..} = haskell
        pFun = do
            vts <- between (reservedOp "\\")
                           (reservedOp "->")
                           (many $ parens $ do v <- identifier; reservedOp "::"; t <- pTyp; return (v, t))
               <|> return []
            e <- pExp
            return $ UFun vts e
        pTyp = choice [do reserved "Bool"; return UTBol, do reserved "Double"; return UTDbl]
        pExp = choice [pIf, pLet, pOExp]
        pIf = do reserved "if"; c <- pExp; reserved "then"; t <- pExp; reserved "else"; e <- pExp; return $ UApp "if" [c, t, e]
        pLet = do reserved "let"; i <- identifier; reservedOp "="; e <- pExp; reserved "in"; b <- pExp; return $ ULet i e b
        pOExp = buildExpressionParser opTable pAExp
        pAExp = choice [pDbl, pVar, parens pExp]
        pVar = fmap eVar identifier
        pDbl = fmap (either (UDbl . fromInteger) UDbl) naturalOrFloat
        eVar i = if i == "False" then UBol False else if i == "True" then UBol True else UVar i

        opTable = reverse . map (map mkOp) . groupBy ((==) `on` prec) . sortBy (compare `on` prec) $ ops
          where mkOp (s, (_, a)) = Infix (do reservedOp s; return $ \ x y -> UApp s [x, y]) a
                prec = fst . snd

mParseUFun :: String -> Maybe UFun
mParseUFun = either (const Nothing) Just . (parseUFun "")
The parser is packaged up in mParseUFun which returns an AST if it all worked.

The TExp module

Since the LLVM API is typed it's much easier to translate a typed abstract syntax tree than an untyped abstract syntax tree. The TExp module contains the definition of the typed AST and the type checker that converts to it. There are many ways to formulate type safe abstract syntax trees. I've chosen to use GADTs. I've also picked to represent variables (still) by identifiers, which means that the syntax tree is not necessarily type safe. Furthermore, I've chosen a very limited way to represent function application since this is all I need for this example. The variantions on this are endless.
{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-}
module TExp(Id,
            TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..),
            Equal(..), test,
            Type(..),
            AFun(..), extractFun,
            typeCheck, toTFun) where
import Data.Maybe
import Control.Monad
import UExp

data TFun a where
    TBody :: TExp a                 -> TFun a
    TLam  :: Id -> TTyp a -> TFun b -> TFun (a->b)

data TTyp a where
    TTBol ::                     TTyp Bool
    TTDbl ::                     TTyp Double
    TTArr :: TTyp a -> TTyp b -> TTyp (a->b)

data TExp a where
    TDbl   :: Double                                            -> TExp Double
    TBol   :: Bool                                              -> TExp Bool
    TDblOp :: DblOp     -> TExp Double -> TExp Double           -> TExp Double
    TBolOp :: BolOp     -> TExp Bool   -> TExp Bool             -> TExp Bool
    TCmpOp :: CmpOp     -> TExp Double -> TExp Double           -> TExp Bool
    TIf    :: TExp Bool -> TExp a      -> TExp a                -> TExp a
    TLet   :: Id        -> TTyp a      -> TExp a      -> TExp b -> TExp b
    TVar   :: Id                                                -> TExp a

data DblOp = DAdd | DSub | DMul | DDiv
    deriving (Eq, Show)

data BolOp = BAnd | BOr
    deriving (Eq, Show)

data CmpOp = CEq | CLe
    deriving (Eq, Show)
So for instance, UApp "+" [UVar "x", UDbl 2.2] will be represented by TDblOp DAdd (TVar "x") (TDbl 2.2) which has type TExp Double. So the type of the expression is now accurately reflected in the type of the syntax tree. Even the UTyp type now has a typed equivalent where the real type is reflected. For completeness, here's some code for pretty printing etc.
{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-}
module TExp(Id,
            TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..),
            Equal(..), test,
            Type(..),
            AFun(..), extractFun,
            typeCheck, toTFun) where
import Data.Maybe
import Control.Monad
import UExp

instance Show (TFun a) where
    showsPrec p (TBody e) = showsPrec p e
    showsPrec p (TLam i t e) = showParen (p>0)
      (showString "\\ " . showParen True (showString i . showString " :: " . showsPrec 0 t) . showString " -> " . showsPrec 0 e)

instance Show (TTyp a) where
    showsPrec _ TTBol = showString "Bool"
    showsPrec _ TTDbl = showString "Double"
    showsPrec p (TTArr a b) = showParen (p>5) (showsPrec 6 a . showString " -> " . showsPrec 5 b)

instance Show (TExp a) where
    showsPrec p (TDbl d) = showsPrec p d
    showsPrec p (TBol b) = showsPrec p b
    showsPrec _ (TVar i) = showString i
    showsPrec p (TDblOp op a b) = showOp p (fromJust $ lookup op [(DMul, "*"), (DAdd, "+"), (DSub, "-"), (DDiv, "/")]) a b
    showsPrec p (TBolOp op a b) = showOp p (fromJust $ lookup op [(BAnd, "&&"), (BOr, "||")]) a b
    showsPrec p (TCmpOp op a b) = showOp p (fromJust $ lookup op [(CEq, "=="), (CLe, "<=")]) a b
    showsPrec p (TIf c t e) = showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e)
    showsPrec p (TLet i _ e b) =
      showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b)

The aim of the type checker is to transform from the UExp type to the TExp type, so basically
typeCheckExp :: UExp -> TExp a
But things can go wrong, so it's impossible to always return a TExp, so let's use a Maybe type:
typeCheckExp :: UExp -> Maybe (TExp a)
But wait! This type is totally wrong. Why? Because it promises that given a UExp the type checker can return any type, i.e., writing out the (normally implicit) quantifier the type is:
typeCheckExp :: forall a . UExp -> Maybe (TExp a)
But this is not the case, the type checker will figure out a type and return an expression with this specific type, so the type we really want is
typeCheckExp :: exists a . UExp -> Maybe (TExp a)
Haskell doesn't allow this type to be written this way; we need to package up the existential type in a data type. Like so:
data ATExp = forall a . TExp a ::: TTyp a

data AFun = forall a . AFun (TFun a) (TTyp a)
It might look funny that the existential type is written with a forall, but it makes sense when looking at the type of the constructor function (but not when doing pattern matching). Now we can attempt a couple of cases of the type checker:
typeCheckExp :: UExp -> Maybe ATExp
typeCheckExp (UDbl d) =
    return $ TDbl d ::: TTDbl
typeCheckExp (UBol b) =
    return $ TBol b ::: TTBol
They look quite nice, and they actually work. So what about something more complicated, like arithmetic?
typeCheckExp (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do
    a' ::: TTDbl <- typeCheckExp a
    b' ::: TTDbl <- typeCheckExp b
    return $ TDblOp dop a' b' ::: TTDbl
First we conveniently look up the operator among the arithmetic operators, then we recursively call the type checker for the operands. We do this in the Maybe monad. If the type checking a subterm fails that's automatically propagated, and furthermore, if the type checking of a subterm does not yield a TTDbl type then this will cause the pattern matching to fail, and this will generate a Nothing in the maybe monad, so we used failing pattern matching to our advantage here. The interesting case is checking UIf, because here both arms have to have the same type, but we don't know which one. Here's an attempt:
typeCheckExp (UApp "if" [c,t,e]) = do
    c' ::: TTBol <- typeCheckExp c
    t' ::: tt    <- typeCheckExp t
    e' ::: te    <- typeCheckExp e
    guard (tt == te)
    return $ TIf c' t' e' ::: tt
But this doesn't type check. The guard ensures that the two arms have the same type, but that's something we know, but the Haskell type checker doesn't. So it rejects the TIf, because it can't see that both arms have the same type. We need to be trickier in doing the equality test so that it reflects the equality on the type level. There's a standard trick for this, namely this type:
data Equal a b where
    Eq :: Equal a a
If you ever have a value (which must be Eq) of type Equal foo bar then the type checker will know that foo and bar are actually the same type. So let's code equality for TTyp.
test :: TTyp a -> TTyp b -> Maybe (Equal a b)
test TTBol TTBol = return Eq
test TTDbl TTDbl = return Eq
test (TTArr a b) (TTArr a' b') = do
    Eq <- test a a'
    Eq <- test b b'
    return Eq
test _ _ = mzero
This code is worth pondering for a while, it's actually rather clever (I take no credit for it; I stole it from Tim Sheard). Why does even the first clause type check? Because TTBol has type TTyp Bool, so both the type variables (a and b) must be Bool in the first clause, which means that Eq :: Equal Bool Bool is what we're returning. Equipped with this equality we can try type checking again.
typeCheckExp (UApp "if" [c,t,e]) = do
    c' ::: TTBol <- typeCheckExp c
    t' ::: tt    <- typeCheckExp t
    e' ::: te    <- typeCheckExp e
    Eq <- test tt te
    return $ TIf c' t' e' ::: tt
And amazingly this actually works! (A tribute to the hard working ghc implementors.) One (rather large) fly is left in the ointment. What about variables? What do we do when we type check UVar? We must check that there's a bound variable with the right type around. So the type checker needs to be extended with an environment where variables can be looked up. It's mostly straight forward. The environment simply maps a variable to ATExp. So here's the complete type checker as it's actually defined.
type Env = [(Id, ATExp)]

typeCheckExp :: Env -> UExp -> Maybe ATExp
typeCheckExp _ (UDbl d) =
    return $ TDbl d ::: TTDbl
typeCheckExp _ (UBol b) =
    return $ TBol b ::: TTBol
typeCheckExp r (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do
    a' ::: TTDbl <- typeCheckExp r a
    b' ::: TTDbl <- typeCheckExp r b
    return $ TDblOp dop a' b' ::: TTDbl
typeCheckExp r (UApp op [a, b]) | Just bop <- lookup op [("&&", BAnd), ("||", BOr)] = do
    a' ::: TTBol <- typeCheckExp r a
    b' ::: TTBol <- typeCheckExp r b
    return $ TBolOp bop a' b' ::: TTBol
typeCheckExp r (UApp op [a, b]) | Just cop <- lookup op [("==", CEq), ("<=", CLe)] = do
    a' ::: TTDbl <- typeCheckExp r a
    b' ::: TTDbl <- typeCheckExp r b
    return $ TCmpOp cop a' b' ::: TTBol
typeCheckExp r (UApp "if" [c,t,e]) = do
    c' ::: TTBol <- typeCheckExp r c
    t' ::: tt    <- typeCheckExp r t
    e' ::: te    <- typeCheckExp r e
    Eq <- test tt te
    return $ TIf c' t' e' ::: tt
typeCheckExp r (ULet i e b) = do
    e' ::: te <- typeCheckExp r e
    b' ::: tb <- typeCheckExp ((i, TVar i ::: te) : r) b
    return $ TLet i te e' b' ::: tb
typeCheckExp r (UVar i) =
    lookup i r
typeCheckExp _ _ =
    mzero
Note the ULet case which extends the environment. First we type check the expression that's being bound, and then add a variable to the environment and type check the body. Finally we need to type check the top level:
typeCheck :: UFun -> Maybe AFun
typeCheck = typeCheckFun []

typeCheckFun :: Env -> UFun -> Maybe AFun
typeCheckFun n (UFun [] b) = do
    e ::: t <- typeCheckExp n b
    return $ AFun (TBody e) t
typeCheckFun n (UFun ((x, typ):vts) b) =
    case typ of
    UTBol -> f TTBol
    UTDbl -> f TTDbl
  where f t = do AFun e r <- typeCheckFun ((x, TVar x ::: t) : n) (UFun vts b); return $ AFun (TLam x t e) (TTArr t r)
When encountering the expression we just type check it, and for an argument we add a variable with the right type to the environment. A small test in ghci:
TExp UExp> mParseUFun "\\ (x::Double) -> x+1" >>= typeCheck
Just (\ (x :: Double) -> x+1.0 :: Double -> Double)
To be able to extract a function from ATFun we need some small utilties.
class Type a where
    theType :: TTyp a
instance Type Double where
    theType = TTDbl
instance Type Bool where
    theType = TTBol
instance (Type a, Type b) => Type (a->b) where
    theType = TTArr theType theType

extractFun :: (Type a) => AFun -> Maybe (TFun a)
extractFun = extract theType

extract :: TTyp a -> AFun -> Maybe (TFun a)
extract s (AFun e t) = do
    Eq <- test t s
    return e

toTFun :: (Type a) => UFun -> Maybe (TFun a)
toTFun = extractFun <=< typeCheck
The class Type allows us to construct the TTyp corresponding to a Haskell type via overloading. Using this and the test function we can then extract a TFun at any type we like. If we try to extract at the wrong type we'll just get Nothing and at the right type we get Just.

The Compiler module

Now all we need to do is to write a function translate that translates a TFun a into the corresponding a. Naturally, using LLVM. Let's start with some simple cases in translating literals to LLVM code.
compileExp :: TExp a -> CodeGenFunction r (Value a)
compileExp (TDbl d) = return $ valueOf d
compileExp (TBol b) = return $ valueOf b
The valueOf function is simply the one that lifts a Haskell value into an LLVM value. Note how nice the GADT works out here and we handle both Double and Bool with any need to compromise type safety. What about arithmetic? Equally easy.
compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2)

dblOp :: DblOp -> Value Double -> Value Double -> CodeGenFunction r (Value Double)
dblOp DAdd = add
dblOp DSub = sub
dblOp DMul = mul
dblOp DDiv = fdiv

-- This should be in Control.Monad
bind2 :: (Monad m) => (a -> b -> m c) -> m a -> m b -> m c
bind2 f m1 m2 = do
    x1 <- m1
    x2 <- m2
    f x1 x2
And we can just carry on:
compileExp (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp e1) (compileExp e2)
compileExp (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp e1) (compileExp e2)
compileExp (TIf b t e) = mkIf (compileExp b) (compileExp t) (compileExp e)

bolOp :: BolOp -> Value Bool -> Value Bool -> CodeGenFunction r (Value Bool)
bolOp BAnd = and
bolOp BOr  = or

cmpOp :: CmpOp -> Value Double -> Value Double -> CodeGenFunction r (Value Bool)
cmpOp CEq = fcmp FPOEQ
cmpOp CLe = fcmp FPOLE
(The && and || are not short circuiting in this implementation. It would be easy to change.) It's rather amazing that despite these different branches producing and consuming different types it all works out. It's perfectly type safe and free from coercions. This is the beauty of GADTs. Oh, yeah, mkIf. It's just a piece of mess to create some basic blocks, test, and jump.
mkIf :: (IsFirstClass a) =>
        CodeGenFunction r (Value Bool) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a)
mkIf mb mt me = do
    b <- mb
    tb <- newBasicBlock
    eb <- newBasicBlock
    jb <- newBasicBlock
    condBr b tb eb
    defineBasicBlock tb
    t <- mt
    br jb
    defineBasicBlock eb
    e <- me
    br jb
    defineBasicBlock jb
    phi [(t, tb), (e, eb)]
OK, so was lying. The translate function is not quite as easy as that. Just as with type checking we need an environment because of variables. It's easy to add though, and here's the real code.
compileExp :: (Type a, IsFirstClass a) => Env -> TExp a -> CodeGenFunction r (Value a)
compileExp _ (TDbl d) = return $ valueOf d
compileExp _ (TBol b) = return $ valueOf b
compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TIf b t e) = mkIf (compileExp r b) (compileExp r t) (compileExp r e)
compileExp r (TLet i t e b) = do
    e' <- compileExp' r t e
    compileExp ((i, AValue e' t):r) b
compileExp r (TVar i) = return $ fromJust $ castAValue theType =<< lookup i r   -- lookup cannot fail on type checked code

compileExp' :: Env -> TTyp a -> TExp a -> CodeGenFunction r (Value a)
compileExp' r TTDbl e = compileExp r e
compileExp' r TTBol e = compileExp r e
compileExp' _ _ _ = error $ "compileExp': functions not allowed yet"

data AValue = forall a . AValue (Value a) (TTyp a)

castAValue :: TTyp a -> AValue -> Maybe (Value a)
castAValue t (AValue v s) = do
    Eq <- test t s
    return v

type Env = [(Id, AValue)]
Exactly as for the type checking environment we stick the code generation in an environment, and use castAValue project it out of the existential container. The fromJust call in the TVar case cannot fail on type checked code, but with my string based variable representation I have no evidence of this in the TExp so there's actually a cast in the variable case that can fail if scope and type checking has not been performed. The compileExp' is placate the type checker and help it with some evidence about that we are only binding base values. The rest of the code generation module is just house keeping. It's a little ugly, but not terrible.
-- | Compile a TFun into the corresponding LLVM code.
compileFunction :: (Translate a) =>
                   TFun a -> CodeGenModule (Function (RetIO a))
compileFunction = createFunction ExternalLinkage . compileFun []

class Compile a where
    type CG a
    type RetIO a
    type Returns a
    compileFun :: Env -> TFun a -> CG a

instance Compile Double where
    type CG Double = CodeGenFunction Double ()
    type RetIO Double = IO Double
    type Returns Double = Double
    compileFun r (TBody e) = compileExp r e >>= ret
    -- TLam is not well typed

instance Compile Bool where
    type CG Bool = CodeGenFunction Bool ()
    type RetIO Bool = IO Bool
    type Returns Bool = Bool
    compileFun r (TBody e) = compileExp r e >>= ret
    -- TLam is not well typed

instance (Type a, Compile b) => Compile (a -> b) where
    type CG (a->b) = Value a -> CG b
    type RetIO (a->b) = a -> RetIO b
    type Returns (a->b) = Returns b
    -- TBody is not well typed
    compileFun r (TLam i t e) = \ x -> compileFun ((i, AValue x t):r) e
The verbosity and large number of type functions in this section has convinced me that I need to simplify some of the types and classes involved in the LLVM code generation. To convert and LLVM module we call the JIT. This produces a function that returns a value in the IO monad (to be on the safe side) so we need to get rid of the IO, and finally we can get rid of the top level IO, because externally what we are doing is really pure (in some sense).
translate :: (Translate a) => TFun a -> a
translate = unsafePerformIO . fmap unsafePurify . simpleFunction . compileFunction
The Translate context is just an abbreviation for a big context enforced by the LLVM functions. It looks horrendous, but the type checker figured it out for me and I just pasted it in.
{-# LANGUAGE TypeFamilies, FlexibleContexts, ExistentialQuantification, FlexibleInstances, UndecidableInstances #-}
module Compile(Translate, translate) where
import Data.Maybe
import Prelude hiding (and, or)
import TExp
import LLVM.Core hiding (CmpOp)
import LLVM.ExecutionEngine
import System.IO.Unsafe(unsafePerformIO)

class    (Type a,
          Unsafe (RetIO a) a,
          FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()),
          IsFunction (RetIO a),
          Compile a,
          Translatable (RetIO a)) =>
    Translate a
instance (Type a,
          Unsafe (RetIO a) a,
          FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()),
          IsFunction (RetIO a),
          Compile a,
          Translatable (RetIO a)) =>
    Translate a

Conclusion

And that concludes the three parts of the compiler. In about 400 lines of code we can compile a small subset of Haskell expressions to (efficient) machine code. After type checking the rest of the processing is done in a type safe manner (except for a cast in TVar) which is the intention of the high level LLVM interface. Oh, and if you instrument the code generator a little you can peek at the machine code being produced. For instance, for this input to compile
\ (x::Double) ->
let y = x*(x-1) in
let z = x/y + 1 in
if y <= 0 then 0 else 1/(y-z)
we get
__fun1:
 subl $12, %esp
 movsd LCPI1_0, %xmm0
 movsd 16(%esp), %xmm1
 movapd %xmm1, %xmm2
 subsd %xmm0, %xmm2
 mulsd %xmm1, %xmm2
 pxor %xmm3, %xmm3
 ucomisd %xmm2, %xmm3
 jae LBB1_3
LBB1_1:
 divsd %xmm2, %xmm1
 addsd %xmm0, %xmm1
 subsd %xmm1, %xmm2
 movsd LCPI1_0, %xmm0
 divsd %xmm2, %xmm0
LBB1_2:
 movsd %xmm0, (%esp)
 fldl (%esp)
 addl $12, %esp
 ret
LBB1_3:
 pxor %xmm0, %xmm0
 jmp LBB1_2

10 comments:

  1. I love the Maybe (Equal a b) monadic syntax trick. Although it appears magic, here's how it works:

    do {Eq <- expr ; ...}
    desugars into
    expr >>= \e -> case e of { Eq -> do {...} }

    On the inside of the case, we have successfully pattern-matched on Eq :: Equal a b, so the type equality (a ~ b) is in scope; we can use the two types interchangeably! But the inside of the case is the entire rest of the "do" block, allowing "return Eq" to work.

    ReplyDelete
  2. Thanks so much for this extensive post! I was the fellow on #haskell :) I went another route (simpler, with much less type-safety), but this will definitely be on my mind when I go to refactor my existing stuff. Thanks again!

    ReplyDelete
  3. Thanks for posting this. It's nice to see such a complete example.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I was confused by:

    Because TTBol has type TTyp Bool, so both the type variables (a and b) must be TTBool in the first clause, which means that Eq :: Equal TBol TBol is what we're returning.

    TTBool does not exist, and TBol is a data constructor, not a type. Did you mean that both type variables (a and b) must be Bool in the first clause, which means that Eq :: Equal Bool Bool is what we're returning?

    ReplyDelete
  6. TTBol exists and is a data constructor and it has type TTyp Bool.

    ReplyDelete
  7. I was bitten by LLVM curiosity and decided to try this out as a way to explore Haskell's LLVM bindings.

    All the code appears to compile on GHC 7.4.2, except for the test code as given at the top of the article.

    I get:
    Couldn't match type `Double' with `CodeGenFunction Double ()'
    When using functional dependencies to combine
    FunctionArgs
    (IO Double) (llvm-3.0.1.0:LLVM.Core.CodeGen.FA Double) Double,
    arising from the dependency `f -> g r'
    in the instance declaration in `llvm-3.0.1.0:LLVM.Core.CodeGen'
    FunctionArgs
    (IO Double)
    (CodeGenFunction Double ())
    (CodeGenFunction Double ()),
    arising from a use of `compile' at src/Compile.hs:144:18-24
    In the expression:
    compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"
    In a pattern binding:
    Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"

    At this stage my type-fu isn't strong enough for an 'ah-ha' just looking at this (along with my unfamiliarity of all the LLVM and compiler types here).

    I'm hoping this isn't anything dumb on my part, but I don't think I've missed anything.

    Any help greatly appreciated as I'd love to use this as a basis for experimentation. Thanks!

    ReplyDelete