## More LLVM

Recently someone asked me on #haskell if you could use the Haskell LLVM bindings to compile some abstract syntax to a Haskell function. Naturally I said yes, but then I realized I had only done it for a boring language with just one type. I had no doubt that it could be done for more complicated languages with multiple types, but it might not be totally obvious how. So I decided to write a simple compiler, and this blog post is the result. First, a simple example:main = do let f :: Double -> Double Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)" print (f 2, f 3, f 0)Running this program produces (as expected)

(0.25,0.1111111111111111,0.0)What has happened is that the string has been parsed to an abstract syntax tree, translated into LLVM code, then to machine code, and finally turned back into a Haskell callable function. Many things can go wrong along the way, like syntax and type errors, so

`compile`returns a

`Maybe`type to indicate if things went right or wrong. (A more serious version of the

`compile`function would return an error message when something has gone wrong.) The definition of the compilation function is simple and illustrates the flow of the compiler

compile :: (Translate a) => String -> Maybe a compile = fmap translate . toTFun <=< mParseUFunThe context

`Translate`is there to limit the types that can actually be translated; it's a necessary evil and exactly what types are allowed depends on how advanced we make the compiler. Had we ignored the

`Maybe`type the definitions would have been

compile = translate . toTFun . mParseUFunwhich says, first parse to the type

`UFun`(untyped expressions), then type check and turn it into the type

`TFun a`, and finally translate

`TFun a`into an

`a`by LLVM compilation. Let's see how this all works.

### The UExp module

The first step is to just define an abstract syntax for the expressions that we want to handle. I'm only allowing leading lambdas (this a very first order language), so there's a distiction between the top level`UFun`type and the expression type

`UExp`. The U prefix indicates that this version of the syntax is not yet type checked. The definition is pretty boring, but here it is:

{-# OPTIONS_GHC -fno-warn-unused-binds #-} {-# LANGUAGE RecordWildCards #-} module UExp(Id, UFun(..), UTyp(..), UExp(..), parseUFun, showOp, mParseUFun) where import Data.Maybe import Data.List import Data.Function import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Expr import Text.ParserCombinators.Parsec.Token import Text.ParserCombinators.Parsec.Language type Id = String data UFun = UFun [(Id, UTyp)] UExp data UTyp = UTBol | UTDbl data UExp = UDbl Double -- ^ Double literal | UBol Bool -- ^ Bool literal | UVar Id -- ^ Variable | UApp Id [UExp] -- ^ Function application | ULet Id UExp UExp -- ^ Local bindingNaturally, we want to be able to show the expressions, if nothing else so for debugging. So I make a

`Show`instance that shows them in a nice way respecting operator precedences etc. There's nothing exciting going on, the large number of lines is just to cover operator printing.

instance Show UFun where showsPrec p (UFun [] e) = showsPrec p e showsPrec p (UFun vts e) = showParen (p>0) (showString "\\ " . foldr (.) (showString "-> ") (map f vts) . showsPrec 0 e) where f (v, t) = showParen True (showString v . showString " :: " . showsPrec 0 t) . showString " " instance Show UTyp where showsPrec _ UTDbl = showString "Double" showsPrec _ UTBol = showString "Bool" instance Show UExp where showsPrec p (UDbl d) = showsPrec p d showsPrec p (UBol b) = showsPrec p b showsPrec _ (UVar i) = showString i showsPrec p (UApp "if" [c, t, e]) = showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e) showsPrec p (UApp op [a, b]) = showOp p op a b showsPrec _ (UApp op _) = error $ "Uxp.show " ++ op showsPrec p (ULet i e b) = showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b) showOp :: (Show a, Show b) => Int -> String -> a -> b -> String -> String showOp q sop a b = showParen (q>mp) (showsPrec lp a . showString sop . showsPrec rp b) where (lp,mp,rp) = case lookup sop ops of Just (p, AssocLeft) -> (p, p, p+1) Just (p, AssocRight) -> (p+1, p, p) Just (p, AssocNone) -> (p+1, p, p+1) Nothing -> (9, 9, 10) ops :: [(String, (Int, Assoc))] ops = [("+", (6, AssocLeft)), ("-", (6, AssocLeft)), ("*", (7, AssocLeft)), ("/", (7, AssocLeft)), ("==", (4, AssocNone)), ("<=", (4, AssocNone)), ("&&", (3, AssocRight)), ("||", (2, AssocRight)) ]We also want to be able to parse, so I'm using

`Parsec`to parse the string and produce an AST. Again, there's nothing interesting going on. I use the Haskell lexical analysis provided by

`Parsec`. This is available as a

`TokenParser`record, which can be conveniently opened with the

`RecordWildcard`notation

`TokenParser{..}`.

parseUFun :: SourceName -> String -> Either ParseError UFun parseUFun = parse $ do f <- pFun; eof; return f where TokenParser{..} = haskell pFun = do vts <- between (reservedOp "\\") (reservedOp "->") (many $ parens $ do v <- identifier; reservedOp "::"; t <- pTyp; return (v, t)) <|> return [] e <- pExp return $ UFun vts e pTyp = choice [do reserved "Bool"; return UTBol, do reserved "Double"; return UTDbl] pExp = choice [pIf, pLet, pOExp] pIf = do reserved "if"; c <- pExp; reserved "then"; t <- pExp; reserved "else"; e <- pExp; return $ UApp "if" [c, t, e] pLet = do reserved "let"; i <- identifier; reservedOp "="; e <- pExp; reserved "in"; b <- pExp; return $ ULet i e b pOExp = buildExpressionParser opTable pAExp pAExp = choice [pDbl, pVar, parens pExp] pVar = fmap eVar identifier pDbl = fmap (either (UDbl . fromInteger) UDbl) naturalOrFloat eVar i = if i == "False" then UBol False else if i == "True" then UBol True else UVar i opTable = reverse . map (map mkOp) . groupBy ((==) `on` prec) . sortBy (compare `on` prec) $ ops where mkOp (s, (_, a)) = Infix (do reservedOp s; return $ \ x y -> UApp s [x, y]) a prec = fst . snd mParseUFun :: String -> Maybe UFun mParseUFun = either (const Nothing) Just . (parseUFun "")The parser is packaged up in

`mParseUFun`which returns an AST if it all worked.

### The TExp module

Since the LLVM API is typed it's much easier to translate a typed abstract syntax tree than an untyped abstract syntax tree. The`TExp`module contains the definition of the typed AST and the type checker that converts to it. There are many ways to formulate type safe abstract syntax trees. I've chosen to use GADTs. I've also picked to represent variables (still) by identifiers, which means that the syntax tree is not necessarily type safe. Furthermore, I've chosen a very limited way to represent function application since this is all I need for this example. The variantions on this are endless.

{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-} module TExp(Id, TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..), Equal(..), test, Type(..), AFun(..), extractFun, typeCheck, toTFun) where import Data.Maybe import Control.Monad import UExp data TFun a where TBody :: TExp a -> TFun a TLam :: Id -> TTyp a -> TFun b -> TFun (a->b) data TTyp a where TTBol :: TTyp Bool TTDbl :: TTyp Double TTArr :: TTyp a -> TTyp b -> TTyp (a->b) data TExp a where TDbl :: Double -> TExp Double TBol :: Bool -> TExp Bool TDblOp :: DblOp -> TExp Double -> TExp Double -> TExp Double TBolOp :: BolOp -> TExp Bool -> TExp Bool -> TExp Bool TCmpOp :: CmpOp -> TExp Double -> TExp Double -> TExp Bool TIf :: TExp Bool -> TExp a -> TExp a -> TExp a TLet :: Id -> TTyp a -> TExp a -> TExp b -> TExp b TVar :: Id -> TExp a data DblOp = DAdd | DSub | DMul | DDiv deriving (Eq, Show) data BolOp = BAnd | BOr deriving (Eq, Show) data CmpOp = CEq | CLe deriving (Eq, Show)So for instance,

`UApp "+" [UVar "x", UDbl 2.2]`will be represented by

`TDblOp DAdd (TVar "x") (TDbl 2.2)`which has type

`TExp Double`. So the type of the expression is now accurately reflected in the type of the syntax tree. Even the

`UTyp`type now has a typed equivalent where the real type is reflected. For completeness, here's some code for pretty printing etc.

{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-} module TExp(Id, TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..), Equal(..), test, Type(..), AFun(..), extractFun, typeCheck, toTFun) where import Data.Maybe import Control.Monad import UExp instance Show (TFun a) where showsPrec p (TBody e) = showsPrec p e showsPrec p (TLam i t e) = showParen (p>0) (showString "\\ " . showParen True (showString i . showString " :: " . showsPrec 0 t) . showString " -> " . showsPrec 0 e) instance Show (TTyp a) where showsPrec _ TTBol = showString "Bool" showsPrec _ TTDbl = showString "Double" showsPrec p (TTArr a b) = showParen (p>5) (showsPrec 6 a . showString " -> " . showsPrec 5 b) instance Show (TExp a) where showsPrec p (TDbl d) = showsPrec p d showsPrec p (TBol b) = showsPrec p b showsPrec _ (TVar i) = showString i showsPrec p (TDblOp op a b) = showOp p (fromJust $ lookup op [(DMul, "*"), (DAdd, "+"), (DSub, "-"), (DDiv, "/")]) a b showsPrec p (TBolOp op a b) = showOp p (fromJust $ lookup op [(BAnd, "&&"), (BOr, "||")]) a b showsPrec p (TCmpOp op a b) = showOp p (fromJust $ lookup op [(CEq, "=="), (CLe, "<=")]) a b showsPrec p (TIf c t e) = showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e) showsPrec p (TLet i _ e b) = showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b)The aim of the type checker is to transform from the

`UExp`type to the

`TExp`type, so basically

typeCheckExp :: UExp -> TExp aBut things can go wrong, so it's impossible to always return a

`TExp`, so let's use a

`Maybe`type:

typeCheckExp :: UExp -> Maybe (TExp a)But wait! This type is totally wrong. Why? Because it promises that given a

`UExp`the type checker can return

**any**type, i.e., writing out the (normally implicit) quantifier the type is:

typeCheckExp :: forall a . UExp -> Maybe (TExp a)But this is not the case, the type checker will figure out a type and return an expression with this specific type, so the type we really want is

typeCheckExp :: exists a . UExp -> Maybe (TExp a)Haskell doesn't allow this type to be written this way; we need to package up the existential type in a data type. Like so:

data ATExp = forall a . TExp a ::: TTyp a data AFun = forall a . AFun (TFun a) (TTyp a)It might look funny that the existential type is written with a forall, but it makes sense when looking at the type of the constructor function (but not when doing pattern matching). Now we can attempt a couple of cases of the type checker:

typeCheckExp :: UExp -> Maybe ATExp typeCheckExp (UDbl d) = return $ TDbl d ::: TTDbl typeCheckExp (UBol b) = return $ TBol b ::: TTBolThey look quite nice, and they actually work. So what about something more complicated, like arithmetic?

typeCheckExp (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do a' ::: TTDbl <- typeCheckExp a b' ::: TTDbl <- typeCheckExp b return $ TDblOp dop a' b' ::: TTDblFirst we conveniently look up the operator among the arithmetic operators, then we recursively call the type checker for the operands. We do this in the

`Maybe`monad. If the type checking a subterm fails that's automatically propagated, and furthermore, if the type checking of a subterm does not yield a

`TTDbl`type then this will cause the pattern matching to fail, and this will generate a

`Nothing`in the maybe monad, so we used failing pattern matching to our advantage here. The interesting case is checking

`UIf`, because here both arms have to have the same type, but we don't know which one. Here's an attempt:

typeCheckExp (UApp "if" [c,t,e]) = do c' ::: TTBol <- typeCheckExp c t' ::: tt <- typeCheckExp t e' ::: te <- typeCheckExp e guard (tt == te) return $ TIf c' t' e' ::: ttBut this doesn't type check. The guard ensures that the two arms have the same type, but that's something we know, but the Haskell type checker doesn't. So it rejects the

`TIf`, because it can't see that both arms have the same type. We need to be trickier in doing the equality test so that it reflects the equality on the type level. There's a standard trick for this, namely this type:

data Equal a b where Eq :: Equal a aIf you ever have a value (which must be

`Eq`) of type

`Equal foo bar`then the type checker will know that

`foo`and

`bar`are actually the same type. So let's code equality for

`TTyp`.

test :: TTyp a -> TTyp b -> Maybe (Equal a b) test TTBol TTBol = return Eq test TTDbl TTDbl = return Eq test (TTArr a b) (TTArr a' b') = do Eq <- test a a' Eq <- test b b' return Eq test _ _ = mzeroThis code is worth pondering for a while, it's actually rather clever (I take no credit for it; I stole it from Tim Sheard). Why does even the first clause type check? Because

`TTBol`has type

`TTyp Bool`, so both the type variables (a and b) must be

`Bool`in the first clause, which means that

`Eq :: Equal Bool Bool`is what we're returning. Equipped with this equality we can try type checking again.

typeCheckExp (UApp "if" [c,t,e]) = do c' ::: TTBol <- typeCheckExp c t' ::: tt <- typeCheckExp t e' ::: te <- typeCheckExp e Eq <- test tt te return $ TIf c' t' e' ::: ttAnd amazingly this actually works! (A tribute to the hard working ghc implementors.) One (rather large) fly is left in the ointment. What about variables? What do we do when we type check

`UVar`? We must check that there's a bound variable with the right type around. So the type checker needs to be extended with an environment where variables can be looked up. It's mostly straight forward. The environment simply maps a variable to

`ATExp`. So here's the complete type checker as it's actually defined.

type Env = [(Id, ATExp)] typeCheckExp :: Env -> UExp -> Maybe ATExp typeCheckExp _ (UDbl d) = return $ TDbl d ::: TTDbl typeCheckExp _ (UBol b) = return $ TBol b ::: TTBol typeCheckExp r (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do a' ::: TTDbl <- typeCheckExp r a b' ::: TTDbl <- typeCheckExp r b return $ TDblOp dop a' b' ::: TTDbl typeCheckExp r (UApp op [a, b]) | Just bop <- lookup op [("&&", BAnd), ("||", BOr)] = do a' ::: TTBol <- typeCheckExp r a b' ::: TTBol <- typeCheckExp r b return $ TBolOp bop a' b' ::: TTBol typeCheckExp r (UApp op [a, b]) | Just cop <- lookup op [("==", CEq), ("<=", CLe)] = do a' ::: TTDbl <- typeCheckExp r a b' ::: TTDbl <- typeCheckExp r b return $ TCmpOp cop a' b' ::: TTBol typeCheckExp r (UApp "if" [c,t,e]) = do c' ::: TTBol <- typeCheckExp r c t' ::: tt <- typeCheckExp r t e' ::: te <- typeCheckExp r e Eq <- test tt te return $ TIf c' t' e' ::: tt typeCheckExp r (ULet i e b) = do e' ::: te <- typeCheckExp r e b' ::: tb <- typeCheckExp ((i, TVar i ::: te) : r) b return $ TLet i te e' b' ::: tb typeCheckExp r (UVar i) = lookup i r typeCheckExp _ _ = mzeroNote the

`ULet`case which extends the environment. First we type check the expression that's being bound, and then add a variable to the environment and type check the body. Finally we need to type check the top level:

typeCheck :: UFun -> Maybe AFun typeCheck = typeCheckFun [] typeCheckFun :: Env -> UFun -> Maybe AFun typeCheckFun n (UFun [] b) = do e ::: t <- typeCheckExp n b return $ AFun (TBody e) t typeCheckFun n (UFun ((x, typ):vts) b) = case typ of UTBol -> f TTBol UTDbl -> f TTDbl where f t = do AFun e r <- typeCheckFun ((x, TVar x ::: t) : n) (UFun vts b); return $ AFun (TLam x t e) (TTArr t r)When encountering the expression we just type check it, and for an argument we add a variable with the right type to the environment. A small test in ghci:

TExp UExp> mParseUFun "\\ (x::Double) -> x+1" >>= typeCheck Just (\ (x :: Double) -> x+1.0 :: Double -> Double)To be able to extract a function from

`ATFun`we need some small utilties.

class Type a where theType :: TTyp a instance Type Double where theType = TTDbl instance Type Bool where theType = TTBol instance (Type a, Type b) => Type (a->b) where theType = TTArr theType theType extractFun :: (Type a) => AFun -> Maybe (TFun a) extractFun = extract theType extract :: TTyp a -> AFun -> Maybe (TFun a) extract s (AFun e t) = do Eq <- test t s return e toTFun :: (Type a) => UFun -> Maybe (TFun a) toTFun = extractFun <=< typeCheckThe class

`Type`allows us to construct the

`TTyp`corresponding to a Haskell type via overloading. Using this and the

`test`function we can then extract a

`TFun`at any type we like. If we try to extract at the wrong type we'll just get

`Nothing`and at the right type we get

`Just`.

### The Compiler module

Now all we need to do is to write a function`translate`that translates a

`TFun a`into the corresponding

`a`. Naturally, using LLVM. Let's start with some simple cases in translating literals to LLVM code.

compileExp :: TExp a -> CodeGenFunction r (Value a) compileExp (TDbl d) = return $ valueOf d compileExp (TBol b) = return $ valueOf bThe

`valueOf`function is simply the one that lifts a Haskell value into an LLVM value. Note how nice the GADT works out here and we handle both Double and Bool with any need to compromise type safety. What about arithmetic? Equally easy.

compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2) dblOp :: DblOp -> Value Double -> Value Double -> CodeGenFunction r (Value Double) dblOp DAdd = add dblOp DSub = sub dblOp DMul = mul dblOp DDiv = fdiv -- This should be in Control.Monad bind2 :: (Monad m) => (a -> b -> m c) -> m a -> m b -> m c bind2 f m1 m2 = do x1 <- m1 x2 <- m2 f x1 x2And we can just carry on:

compileExp (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp e1) (compileExp e2) compileExp (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp e1) (compileExp e2) compileExp (TIf b t e) = mkIf (compileExp b) (compileExp t) (compileExp e) bolOp :: BolOp -> Value Bool -> Value Bool -> CodeGenFunction r (Value Bool) bolOp BAnd = and bolOp BOr = or cmpOp :: CmpOp -> Value Double -> Value Double -> CodeGenFunction r (Value Bool) cmpOp CEq = fcmp FPOEQ cmpOp CLe = fcmp FPOLE(The

`&&`and

`||`are not short circuiting in this implementation. It would be easy to change.) It's rather amazing that despite these different branches producing and consuming different types it all works out. It's perfectly type safe and free from coercions. This is the beauty of GADTs. Oh, yeah,

`mkIf`. It's just a piece of mess to create some basic blocks, test, and jump.

mkIf :: (IsFirstClass a) => CodeGenFunction r (Value Bool) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a) mkIf mb mt me = do b <- mb tb <- newBasicBlock eb <- newBasicBlock jb <- newBasicBlock condBr b tb eb defineBasicBlock tb t <- mt br jb defineBasicBlock eb e <- me br jb defineBasicBlock jb phi [(t, tb), (e, eb)]OK, so was lying. The

`translate`function is not quite as easy as that. Just as with type checking we need an environment because of variables. It's easy to add though, and here's the real code.

compileExp :: (Type a, IsFirstClass a) => Env -> TExp a -> CodeGenFunction r (Value a) compileExp _ (TDbl d) = return $ valueOf d compileExp _ (TBol b) = return $ valueOf b compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2) compileExp r (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp r e1) (compileExp r e2) compileExp r (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp r e1) (compileExp r e2) compileExp r (TIf b t e) = mkIf (compileExp r b) (compileExp r t) (compileExp r e) compileExp r (TLet i t e b) = do e' <- compileExp' r t e compileExp ((i, AValue e' t):r) b compileExp r (TVar i) = return $ fromJust $ castAValue theType =<< lookup i r -- lookup cannot fail on type checked code compileExp' :: Env -> TTyp a -> TExp a -> CodeGenFunction r (Value a) compileExp' r TTDbl e = compileExp r e compileExp' r TTBol e = compileExp r e compileExp' _ _ _ = error $ "compileExp': functions not allowed yet" data AValue = forall a . AValue (Value a) (TTyp a) castAValue :: TTyp a -> AValue -> Maybe (Value a) castAValue t (AValue v s) = do Eq <- test t s return v type Env = [(Id, AValue)]Exactly as for the type checking environment we stick the code generation in an environment, and use

`castAValue`project it out of the existential container. The

`fromJust`call in the

`TVar`case cannot fail on type checked code, but with my string based variable representation I have no evidence of this in the

`TExp`so there's actually a cast in the variable case that can fail if scope and type checking has not been performed. The

`compileExp'`is placate the type checker and help it with some evidence about that we are only binding base values. The rest of the code generation module is just house keeping. It's a little ugly, but not terrible.

-- | Compile a TFun into the corresponding LLVM code. compileFunction :: (Translate a) => TFun a -> CodeGenModule (Function (RetIO a)) compileFunction = createFunction ExternalLinkage . compileFun [] class Compile a where type CG a type RetIO a type Returns a compileFun :: Env -> TFun a -> CG a instance Compile Double where type CG Double = CodeGenFunction Double () type RetIO Double = IO Double type Returns Double = Double compileFun r (TBody e) = compileExp r e >>= ret -- TLam is not well typed instance Compile Bool where type CG Bool = CodeGenFunction Bool () type RetIO Bool = IO Bool type Returns Bool = Bool compileFun r (TBody e) = compileExp r e >>= ret -- TLam is not well typed instance (Type a, Compile b) => Compile (a -> b) where type CG (a->b) = Value a -> CG b type RetIO (a->b) = a -> RetIO b type Returns (a->b) = Returns b -- TBody is not well typed compileFun r (TLam i t e) = \ x -> compileFun ((i, AValue x t):r) eThe verbosity and large number of type functions in this section has convinced me that I need to simplify some of the types and classes involved in the LLVM code generation. To convert and LLVM module we call the JIT. This produces a function that returns a value in the IO monad (to be on the safe side) so we need to get rid of the IO, and finally we can get rid of the top level IO, because externally what we are doing is really pure (in some sense).

translate :: (Translate a) => TFun a -> a translate = unsafePerformIO . fmap unsafePurify . simpleFunction . compileFunctionThe

`Translate`context is just an abbreviation for a big context enforced by the LLVM functions. It looks horrendous, but the type checker figured it out for me and I just pasted it in.

{-# LANGUAGE TypeFamilies, FlexibleContexts, ExistentialQuantification, FlexibleInstances, UndecidableInstances #-} module Compile(Translate, translate) where import Data.Maybe import Prelude hiding (and, or) import TExp import LLVM.Core hiding (CmpOp) import LLVM.ExecutionEngine import System.IO.Unsafe(unsafePerformIO) class (Type a, Unsafe (RetIO a) a, FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()), IsFunction (RetIO a), Compile a, Translatable (RetIO a)) => Translate a instance (Type a, Unsafe (RetIO a) a, FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()), IsFunction (RetIO a), Compile a, Translatable (RetIO a)) => Translate a

### Conclusion

And that concludes the three parts of the compiler. In about 400 lines of code we can compile a small subset of Haskell expressions to (efficient) machine code. After type checking the rest of the processing is done in a type safe manner (except for a cast in`TVar`) which is the intention of the high level LLVM interface. Oh, and if you instrument the code generator a little you can peek at the machine code being produced. For instance, for this input to

`compile`

\ (x::Double) -> let y = x*(x-1) in let z = x/y + 1 in if y <= 0 then 0 else 1/(y-z)we get

__fun1: subl $12, %esp movsd LCPI1_0, %xmm0 movsd 16(%esp), %xmm1 movapd %xmm1, %xmm2 subsd %xmm0, %xmm2 mulsd %xmm1, %xmm2 pxor %xmm3, %xmm3 ucomisd %xmm2, %xmm3 jae LBB1_3 LBB1_1: divsd %xmm2, %xmm1 addsd %xmm0, %xmm1 subsd %xmm1, %xmm2 movsd LCPI1_0, %xmm0 divsd %xmm2, %xmm0 LBB1_2: movsd %xmm0, (%esp) fldl (%esp) addl $12, %esp ret LBB1_3: pxor %xmm0, %xmm0 jmp LBB1_2

## 10 Comments:

I love the Maybe (Equal a b) monadic syntax trick. Although it appears magic, here's how it works:

do {Eq <- expr ; ...}

desugars into

expr >>= \e -> case e of { Eq -> do {...} }

On the inside of the case, we have successfully pattern-matched on Eq :: Equal a b, so the type equality (a ~ b) is in scope; we can use the two types interchangeably! But the inside of the case is the entire rest of the "do" block, allowing "return Eq" to work.

Thanks so much for this extensive post! I was the fellow on #haskell :) I went another route (simpler, with much less type-safety), but this will definitely be on my mind when I go to refactor my existing stuff. Thanks again!

Nice post, thanks for the insights!

Thanks for posting this. It's nice to see such a complete example.

This comment has been removed by the author.

I was confused by:

Because TTBol has type TTyp Bool, so both the type variables (a and b) must be TTBool in the first clause, which means that Eq :: Equal TBol TBol is what we're returning.

TTBool does not exist, and TBol is a data constructor, not a type. Did you mean that both type variables (a and b) must be Bool in the first clause, which means that Eq :: Equal Bool Bool is what we're returning?

TTBol exists and is a data constructor and it has type TTyp Bool.

I've fixed the typos.

I've fixed the typos.

I was bitten by LLVM curiosity and decided to try this out as a way to explore Haskell's LLVM bindings.

All the code appears to compile on GHC 7.4.2, except for the test code as given at the top of the article.

I get:

Couldn't match type `Double' with `CodeGenFunction Double ()'

When using functional dependencies to combine

FunctionArgs

(IO Double) (llvm-3.0.1.0:LLVM.Core.CodeGen.FA Double) Double,

arising from the dependency `f -> g r'

in the instance declaration in `llvm-3.0.1.0:LLVM.Core.CodeGen'

FunctionArgs

(IO Double)

(CodeGenFunction Double ())

(CodeGenFunction Double ()),

arising from a use of `compile' at src/Compile.hs:144:18-24

In the expression:

compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"

In a pattern binding:

Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"

At this stage my type-fu isn't strong enough for an 'ah-ha' just looking at this (along with my unfamiliarity of all the LLVM and compiler types here).

I'm hoping this isn't anything dumb on my part, but I don't think I've missed anything.

Any help greatly appreciated as I'd love to use this as a basis for experimentation. Thanks!

Post a Comment

<< Home