More LLVM
Recently someone asked me on #haskell if you could use the Haskell LLVM bindings to compile some abstract syntax to a Haskell function. Naturally I said yes, but then I realized I had only done it for a boring language with just one type. I had no doubt that it could be done for more complicated languages with multiple types, but it might not be totally obvious how. So I decided to write a simple compiler, and this blog post is the result. First, a simple example:
main = do
let f :: Double -> Double
Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"
print (f 2, f 3, f 0)
Running this program produces (as expected)
(0.25,0.1111111111111111,0.0)
What has happened is that the string has been parsed to an abstract syntax tree, translated into LLVM code, then to machine code, and finally turned back into a Haskell callable function. Many things can go wrong along the way, like syntax and type errors, so
compile returns a
Maybe type to indicate if things went right or wrong. (A more serious version of the
compile function would return an error message when something has gone wrong.) The definition of the compilation function is simple and illustrates the flow of the compiler
compile :: (Translate a) => String -> Maybe a
compile = fmap translate . toTFun <=< mParseUFun
The context
Translate is there to limit the types that can actually be translated; it's a necessary evil and exactly what types are allowed depends on how advanced we make the compiler. Had we ignored the
Maybe type the definitions would have been
compile = translate . toTFun . mParseUFun
which says, first parse to the type
UFun (untyped expressions), then type check and turn it into the type
TFun a, and finally translate
TFun a into an
a by LLVM compilation. Let's see how this all works.
The UExp module
The first step is to just define an abstract syntax for the expressions that we want to handle. I'm only allowing leading lambdas (this a very first order language), so there's a distiction between the top level
UFun type and the expression type
UExp. The U prefix indicates that this version of the syntax is not yet type checked. The definition is pretty boring, but here it is:
{-# OPTIONS_GHC -fno-warn-unused-binds #-}
{-# LANGUAGE RecordWildCards #-}
module UExp(Id, UFun(..), UTyp(..), UExp(..), parseUFun, showOp, mParseUFun) where
import Data.Maybe
import Data.List
import Data.Function
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
type Id = String
data UFun = UFun [(Id, UTyp)] UExp
data UTyp = UTBol | UTDbl
data UExp
= UDbl Double -- ^ Double literal
| UBol Bool -- ^ Bool literal
| UVar Id -- ^ Variable
| UApp Id [UExp] -- ^ Function application
| ULet Id UExp UExp -- ^ Local binding
Naturally, we want to be able to show the expressions, if nothing else so for debugging. So I make a
Show instance that shows them in a nice way respecting operator precedences etc. There's nothing exciting going on, the large number of lines is just to cover operator printing.
instance Show UFun where
showsPrec p (UFun [] e) = showsPrec p e
showsPrec p (UFun vts e) = showParen (p>0) (showString "\\ " . foldr (.) (showString "-> ") (map f vts) . showsPrec 0 e)
where f (v, t) = showParen True (showString v . showString " :: " . showsPrec 0 t) . showString " "
instance Show UTyp where
showsPrec _ UTDbl = showString "Double"
showsPrec _ UTBol = showString "Bool"
instance Show UExp where
showsPrec p (UDbl d) = showsPrec p d
showsPrec p (UBol b) = showsPrec p b
showsPrec _ (UVar i) = showString i
showsPrec p (UApp "if" [c, t, e]) =
showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e)
showsPrec p (UApp op [a, b]) = showOp p op a b
showsPrec _ (UApp op _) = error $ "Uxp.show " ++ op
showsPrec p (ULet i e b) =
showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b)
showOp :: (Show a, Show b) => Int -> String -> a -> b -> String -> String
showOp q sop a b = showParen (q>mp) (showsPrec lp a . showString sop . showsPrec rp b)
where (lp,mp,rp) = case lookup sop ops of
Just (p, AssocLeft) -> (p, p, p+1)
Just (p, AssocRight) -> (p+1, p, p)
Just (p, AssocNone) -> (p+1, p, p+1)
Nothing -> (9, 9, 10)
ops :: [(String, (Int, Assoc))]
ops = [("+", (6, AssocLeft)),
("-", (6, AssocLeft)),
("*", (7, AssocLeft)),
("/", (7, AssocLeft)),
("==", (4, AssocNone)),
("<=", (4, AssocNone)),
("&&", (3, AssocRight)),
("||", (2, AssocRight))
]
We also want to be able to parse, so I'm using
Parsec to parse the string and produce an AST. Again, there's nothing interesting going on. I use the Haskell lexical analysis provided by
Parsec. This is available as a
TokenParser record, which can be conveniently opened with the
RecordWildcard notation
TokenParser{..}.
parseUFun :: SourceName -> String -> Either ParseError UFun
parseUFun = parse $ do f <- pFun; eof; return f
where TokenParser{..} = haskell
pFun = do
vts <- between (reservedOp "\\")
(reservedOp "->")
(many $ parens $ do v <- identifier; reservedOp "::"; t <- pTyp; return (v, t))
<|> return []
e <- pExp
return $ UFun vts e
pTyp = choice [do reserved "Bool"; return UTBol, do reserved "Double"; return UTDbl]
pExp = choice [pIf, pLet, pOExp]
pIf = do reserved "if"; c <- pExp; reserved "then"; t <- pExp; reserved "else"; e <- pExp; return $ UApp "if" [c, t, e]
pLet = do reserved "let"; i <- identifier; reservedOp "="; e <- pExp; reserved "in"; b <- pExp; return $ ULet i e b
pOExp = buildExpressionParser opTable pAExp
pAExp = choice [pDbl, pVar, parens pExp]
pVar = fmap eVar identifier
pDbl = fmap (either (UDbl . fromInteger) UDbl) naturalOrFloat
eVar i = if i == "False" then UBol False else if i == "True" then UBol True else UVar i
opTable = reverse . map (map mkOp) . groupBy ((==) `on` prec) . sortBy (compare `on` prec) $ ops
where mkOp (s, (_, a)) = Infix (do reservedOp s; return $ \ x y -> UApp s [x, y]) a
prec = fst . snd
mParseUFun :: String -> Maybe UFun
mParseUFun = either (const Nothing) Just . (parseUFun "")
The parser is packaged up in
mParseUFun which returns an AST if it all worked.
The TExp module
Since the LLVM API is typed it's much easier to translate a typed abstract syntax tree than an untyped abstract syntax tree. The
TExp module contains the definition of the typed AST and the type checker that converts to it. There are many ways to formulate type safe abstract syntax trees. I've chosen to use GADTs. I've also picked to represent variables (still) by identifiers, which means that the syntax tree is not necessarily type safe. Furthermore, I've chosen a very limited way to represent function application since this is all I need for this example. The variantions on this are endless.
{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-}
module TExp(Id,
TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..),
Equal(..), test,
Type(..),
AFun(..), extractFun,
typeCheck, toTFun) where
import Data.Maybe
import Control.Monad
import UExp
data TFun a where
TBody :: TExp a -> TFun a
TLam :: Id -> TTyp a -> TFun b -> TFun (a->b)
data TTyp a where
TTBol :: TTyp Bool
TTDbl :: TTyp Double
TTArr :: TTyp a -> TTyp b -> TTyp (a->b)
data TExp a where
TDbl :: Double -> TExp Double
TBol :: Bool -> TExp Bool
TDblOp :: DblOp -> TExp Double -> TExp Double -> TExp Double
TBolOp :: BolOp -> TExp Bool -> TExp Bool -> TExp Bool
TCmpOp :: CmpOp -> TExp Double -> TExp Double -> TExp Bool
TIf :: TExp Bool -> TExp a -> TExp a -> TExp a
TLet :: Id -> TTyp a -> TExp a -> TExp b -> TExp b
TVar :: Id -> TExp a
data DblOp = DAdd | DSub | DMul | DDiv
deriving (Eq, Show)
data BolOp = BAnd | BOr
deriving (Eq, Show)
data CmpOp = CEq | CLe
deriving (Eq, Show)
So for instance,
UApp "+" [UVar "x", UDbl 2.2] will be represented by
TDblOp DAdd (TVar "x") (TDbl 2.2) which has type
TExp Double. So the type of the expression is now accurately reflected in the type of the syntax tree. Even the
UTyp type now has a typed equivalent where the real type is reflected. For completeness, here's some code for pretty printing etc.
{-# LANGUAGE GADTs, ExistentialQuantification, PatternGuards #-}
module TExp(Id,
TFun(..), TTyp(..), TExp(..), DblOp(..), BolOp(..), CmpOp(..),
Equal(..), test,
Type(..),
AFun(..), extractFun,
typeCheck, toTFun) where
import Data.Maybe
import Control.Monad
import UExp
instance Show (TFun a) where
showsPrec p (TBody e) = showsPrec p e
showsPrec p (TLam i t e) = showParen (p>0)
(showString "\\ " . showParen True (showString i . showString " :: " . showsPrec 0 t) . showString " -> " . showsPrec 0 e)
instance Show (TTyp a) where
showsPrec _ TTBol = showString "Bool"
showsPrec _ TTDbl = showString "Double"
showsPrec p (TTArr a b) = showParen (p>5) (showsPrec 6 a . showString " -> " . showsPrec 5 b)
instance Show (TExp a) where
showsPrec p (TDbl d) = showsPrec p d
showsPrec p (TBol b) = showsPrec p b
showsPrec _ (TVar i) = showString i
showsPrec p (TDblOp op a b) = showOp p (fromJust $ lookup op [(DMul, "*"), (DAdd, "+"), (DSub, "-"), (DDiv, "/")]) a b
showsPrec p (TBolOp op a b) = showOp p (fromJust $ lookup op [(BAnd, "&&"), (BOr, "||")]) a b
showsPrec p (TCmpOp op a b) = showOp p (fromJust $ lookup op [(CEq, "=="), (CLe, "<=")]) a b
showsPrec p (TIf c t e) = showParen (p>0) (showString "if " . showsPrec 0 c . showString " then " . showsPrec 0 t . showString " else " . showsPrec 0 e)
showsPrec p (TLet i _ e b) =
showParen (p>0) (showString "let " . showString i . showString " = " . showsPrec 0 e . showString " in " . showsPrec 0 b)
The aim of the type checker is to transform from the
UExp type to the
TExp type, so basically
typeCheckExp :: UExp -> TExp a
But things can go wrong, so it's impossible to always return a
TExp, so let's use a
Maybe type:
typeCheckExp :: UExp -> Maybe (TExp a)
But wait! This type is totally wrong. Why? Because it promises that given a
UExp the type checker can return
any type, i.e., writing out the (normally implicit) quantifier the type is:
typeCheckExp :: forall a . UExp -> Maybe (TExp a)
But this is not the case, the type checker will figure out a type and return an expression with this specific type, so the type we really want is
typeCheckExp :: exists a . UExp -> Maybe (TExp a)
Haskell doesn't allow this type to be written this way; we need to package up the existential type in a data type. Like so:
data ATExp = forall a . TExp a ::: TTyp a
data AFun = forall a . AFun (TFun a) (TTyp a)
It might look funny that the existential type is written with a forall, but it makes sense when looking at the type of the constructor function (but not when doing pattern matching). Now we can attempt a couple of cases of the type checker:
typeCheckExp :: UExp -> Maybe ATExp
typeCheckExp (UDbl d) =
return $ TDbl d ::: TTDbl
typeCheckExp (UBol b) =
return $ TBol b ::: TTBol
They look quite nice, and they actually work. So what about something more complicated, like arithmetic?
typeCheckExp (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do
a' ::: TTDbl <- typeCheckExp a
b' ::: TTDbl <- typeCheckExp b
return $ TDblOp dop a' b' ::: TTDbl
First we conveniently look up the operator among the arithmetic operators, then we recursively call the type checker for the operands. We do this in the
Maybe monad. If the type checking a subterm fails that's automatically propagated, and furthermore, if the type checking of a subterm does not yield a
TTDbl type then this will cause the pattern matching to fail, and this will generate a
Nothing in the maybe monad, so we used failing pattern matching to our advantage here.
The interesting case is checking
UIf, because here both arms have to have the same type, but we don't know which one. Here's an attempt:
typeCheckExp (UApp "if" [c,t,e]) = do
c' ::: TTBol <- typeCheckExp c
t' ::: tt <- typeCheckExp t
e' ::: te <- typeCheckExp e
guard (tt == te)
return $ TIf c' t' e' ::: tt
But this doesn't type check. The guard ensures that the two arms have the same type, but that's something we know, but the Haskell type checker doesn't. So it rejects the
TIf, because it can't see that both arms have the same type. We need to be trickier in doing the equality test so that it reflects the equality on the type level. There's a standard trick for this, namely this type:
data Equal a b where
Eq :: Equal a a
If you ever have a value (which must be
Eq) of type
Equal foo bar then the type checker will know that
foo and
bar are actually the same type. So let's code equality for
TTyp.
test :: TTyp a -> TTyp b -> Maybe (Equal a b)
test TTBol TTBol = return Eq
test TTDbl TTDbl = return Eq
test (TTArr a b) (TTArr a' b') = do
Eq <- test a a'
Eq <- test b b'
return Eq
test _ _ = mzero
This code is worth pondering for a while, it's actually rather clever (I take no credit for it; I stole it from Tim Sheard). Why does even the first clause type check? Because
TTBol has type
TTyp Bool, so both the type variables (a and b) must be
Bool in the first clause, which means that
Eq :: Equal Bool Bool is what we're returning.
Equipped with this equality we can try type checking again.
typeCheckExp (UApp "if" [c,t,e]) = do
c' ::: TTBol <- typeCheckExp c
t' ::: tt <- typeCheckExp t
e' ::: te <- typeCheckExp e
Eq <- test tt te
return $ TIf c' t' e' ::: tt
And amazingly this actually works! (A tribute to the hard working ghc implementors.)
One (rather large) fly is left in the ointment. What about variables? What do we do when we type check
UVar? We must check that there's a bound variable with the right type around. So the type checker needs to be extended with an environment where variables can be looked up. It's mostly straight forward. The environment simply maps a variable to
ATExp. So here's the complete type checker as it's actually defined.
type Env = [(Id, ATExp)]
typeCheckExp :: Env -> UExp -> Maybe ATExp
typeCheckExp _ (UDbl d) =
return $ TDbl d ::: TTDbl
typeCheckExp _ (UBol b) =
return $ TBol b ::: TTBol
typeCheckExp r (UApp op [a, b]) | Just dop <- lookup op [("+", DAdd), ("-", DSub), ("*", DMul), ("/", DDiv)] = do
a' ::: TTDbl <- typeCheckExp r a
b' ::: TTDbl <- typeCheckExp r b
return $ TDblOp dop a' b' ::: TTDbl
typeCheckExp r (UApp op [a, b]) | Just bop <- lookup op [("&&", BAnd), ("||", BOr)] = do
a' ::: TTBol <- typeCheckExp r a
b' ::: TTBol <- typeCheckExp r b
return $ TBolOp bop a' b' ::: TTBol
typeCheckExp r (UApp op [a, b]) | Just cop <- lookup op [("==", CEq), ("<=", CLe)] = do
a' ::: TTDbl <- typeCheckExp r a
b' ::: TTDbl <- typeCheckExp r b
return $ TCmpOp cop a' b' ::: TTBol
typeCheckExp r (UApp "if" [c,t,e]) = do
c' ::: TTBol <- typeCheckExp r c
t' ::: tt <- typeCheckExp r t
e' ::: te <- typeCheckExp r e
Eq <- test tt te
return $ TIf c' t' e' ::: tt
typeCheckExp r (ULet i e b) = do
e' ::: te <- typeCheckExp r e
b' ::: tb <- typeCheckExp ((i, TVar i ::: te) : r) b
return $ TLet i te e' b' ::: tb
typeCheckExp r (UVar i) =
lookup i r
typeCheckExp _ _ =
mzero
Note the
ULet case which extends the environment. First we type check the expression that's being bound, and then add a variable to the environment and type check the body.
Finally we need to type check the top level:
typeCheck :: UFun -> Maybe AFun
typeCheck = typeCheckFun []
typeCheckFun :: Env -> UFun -> Maybe AFun
typeCheckFun n (UFun [] b) = do
e ::: t <- typeCheckExp n b
return $ AFun (TBody e) t
typeCheckFun n (UFun ((x, typ):vts) b) =
case typ of
UTBol -> f TTBol
UTDbl -> f TTDbl
where f t = do AFun e r <- typeCheckFun ((x, TVar x ::: t) : n) (UFun vts b); return $ AFun (TLam x t e) (TTArr t r)
When encountering the expression we just type check it, and for an argument we add a variable with the right type to the environment.
A small test in ghci:
TExp UExp> mParseUFun "\\ (x::Double) -> x+1" >>= typeCheck
Just (\ (x :: Double) -> x+1.0 :: Double -> Double)
To be able to extract a function from
ATFun we need some small utilties.
class Type a where
theType :: TTyp a
instance Type Double where
theType = TTDbl
instance Type Bool where
theType = TTBol
instance (Type a, Type b) => Type (a->b) where
theType = TTArr theType theType
extractFun :: (Type a) => AFun -> Maybe (TFun a)
extractFun = extract theType
extract :: TTyp a -> AFun -> Maybe (TFun a)
extract s (AFun e t) = do
Eq <- test t s
return e
toTFun :: (Type a) => UFun -> Maybe (TFun a)
toTFun = extractFun <=< typeCheck
The class
Type allows us to construct the
TTyp corresponding to a Haskell type via overloading. Using this and the
test function we can then extract a
TFun at any type we like. If we try to extract at the wrong type we'll just get
Nothing and at the right type we get
Just.
The Compiler module
Now all we need to do is to write a function
translate that translates a
TFun a into the corresponding
a. Naturally, using LLVM.
Let's start with some simple cases in translating literals to LLVM code.
compileExp :: TExp a -> CodeGenFunction r (Value a)
compileExp (TDbl d) = return $ valueOf d
compileExp (TBol b) = return $ valueOf b
The
valueOf function is simply the one that lifts a Haskell value into an LLVM value. Note how nice the GADT works out here and we handle both Double and Bool with any need to compromise type safety.
What about arithmetic? Equally easy.
compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2)
dblOp :: DblOp -> Value Double -> Value Double -> CodeGenFunction r (Value Double)
dblOp DAdd = add
dblOp DSub = sub
dblOp DMul = mul
dblOp DDiv = fdiv
-- This should be in Control.Monad
bind2 :: (Monad m) => (a -> b -> m c) -> m a -> m b -> m c
bind2 f m1 m2 = do
x1 <- m1
x2 <- m2
f x1 x2
And we can just carry on:
compileExp (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp e1) (compileExp e2)
compileExp (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp e1) (compileExp e2)
compileExp (TIf b t e) = mkIf (compileExp b) (compileExp t) (compileExp e)
bolOp :: BolOp -> Value Bool -> Value Bool -> CodeGenFunction r (Value Bool)
bolOp BAnd = and
bolOp BOr = or
cmpOp :: CmpOp -> Value Double -> Value Double -> CodeGenFunction r (Value Bool)
cmpOp CEq = fcmp FPOEQ
cmpOp CLe = fcmp FPOLE
(The
&& and
|| are not short circuiting in this implementation. It would be easy to change.)
It's rather amazing that despite these different branches producing and consuming different types it all works out. It's perfectly type safe and free from coercions. This is the beauty of GADTs.
Oh, yeah,
mkIf. It's just a piece of mess to create some basic blocks, test, and jump.
mkIf :: (IsFirstClass a) =>
CodeGenFunction r (Value Bool) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a) -> CodeGenFunction r (Value a)
mkIf mb mt me = do
b <- mb
tb <- newBasicBlock
eb <- newBasicBlock
jb <- newBasicBlock
condBr b tb eb
defineBasicBlock tb
t <- mt
br jb
defineBasicBlock eb
e <- me
br jb
defineBasicBlock jb
phi [(t, tb), (e, eb)]
OK, so was lying. The
translate function is not quite as easy as that. Just as with type checking we need an environment because of variables. It's easy to add though, and here's the real code.
compileExp :: (Type a, IsFirstClass a) => Env -> TExp a -> CodeGenFunction r (Value a)
compileExp _ (TDbl d) = return $ valueOf d
compileExp _ (TBol b) = return $ valueOf b
compileExp r (TDblOp op e1 e2) = bind2 (dblOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TBolOp op e1 e2) = bind2 (bolOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TCmpOp op e1 e2) = bind2 (cmpOp op) (compileExp r e1) (compileExp r e2)
compileExp r (TIf b t e) = mkIf (compileExp r b) (compileExp r t) (compileExp r e)
compileExp r (TLet i t e b) = do
e' <- compileExp' r t e
compileExp ((i, AValue e' t):r) b
compileExp r (TVar i) = return $ fromJust $ castAValue theType =<< lookup i r -- lookup cannot fail on type checked code
compileExp' :: Env -> TTyp a -> TExp a -> CodeGenFunction r (Value a)
compileExp' r TTDbl e = compileExp r e
compileExp' r TTBol e = compileExp r e
compileExp' _ _ _ = error $ "compileExp': functions not allowed yet"
data AValue = forall a . AValue (Value a) (TTyp a)
castAValue :: TTyp a -> AValue -> Maybe (Value a)
castAValue t (AValue v s) = do
Eq <- test t s
return v
type Env = [(Id, AValue)]
Exactly as for the type checking environment we stick the code generation in an environment, and use
castAValue project it out of the existential container. The
fromJust call in the
TVar case cannot fail on type checked code, but with my string based variable representation I have no evidence of this in the
TExp so there's actually a cast in the variable case that can fail if scope and type checking has not been performed. The
compileExp' is placate the type checker and help it with some evidence about that we are only binding base values.
The rest of the code generation module is just house keeping. It's a little ugly, but not terrible.
-- | Compile a TFun into the corresponding LLVM code.
compileFunction :: (Translate a) =>
TFun a -> CodeGenModule (Function (RetIO a))
compileFunction = createFunction ExternalLinkage . compileFun []
class Compile a where
type CG a
type RetIO a
type Returns a
compileFun :: Env -> TFun a -> CG a
instance Compile Double where
type CG Double = CodeGenFunction Double ()
type RetIO Double = IO Double
type Returns Double = Double
compileFun r (TBody e) = compileExp r e >>= ret
-- TLam is not well typed
instance Compile Bool where
type CG Bool = CodeGenFunction Bool ()
type RetIO Bool = IO Bool
type Returns Bool = Bool
compileFun r (TBody e) = compileExp r e >>= ret
-- TLam is not well typed
instance (Type a, Compile b) => Compile (a -> b) where
type CG (a->b) = Value a -> CG b
type RetIO (a->b) = a -> RetIO b
type Returns (a->b) = Returns b
-- TBody is not well typed
compileFun r (TLam i t e) = \ x -> compileFun ((i, AValue x t):r) e
The verbosity and large number of type functions in this section has convinced me that I need to simplify some of the types and classes involved in the LLVM code generation.
To convert and LLVM module we call the JIT. This produces a function that returns a value in the IO monad (to be on the safe side) so we need to get rid of the IO, and finally we can get rid of the top level IO, because externally what we are doing is really pure (in some sense).
translate :: (Translate a) => TFun a -> a
translate = unsafePerformIO . fmap unsafePurify . simpleFunction . compileFunction
The
Translate context is just an abbreviation for a big context enforced by the LLVM functions. It looks horrendous, but the type checker figured it out for me and I just pasted it in.
{-# LANGUAGE TypeFamilies, FlexibleContexts, ExistentialQuantification, FlexibleInstances, UndecidableInstances #-}
module Compile(Translate, translate) where
import Data.Maybe
import Prelude hiding (and, or)
import TExp
import LLVM.Core hiding (CmpOp)
import LLVM.ExecutionEngine
import System.IO.Unsafe(unsafePerformIO)
class (Type a,
Unsafe (RetIO a) a,
FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()),
IsFunction (RetIO a),
Compile a,
Translatable (RetIO a)) =>
Translate a
instance (Type a,
Unsafe (RetIO a) a,
FunctionArgs (RetIO a) (CG a) (CodeGenFunction (Returns a) ()),
IsFunction (RetIO a),
Compile a,
Translatable (RetIO a)) =>
Translate a
Conclusion
And that concludes the three parts of the compiler. In about 400 lines of code we can compile a small subset of Haskell expressions to (efficient) machine code. After type checking the rest of the processing is done in a type safe manner (except for a cast in
TVar) which is the intention of the high level LLVM interface.
Oh, and if you instrument the code generator a little you can peek at the machine code being produced. For instance, for this input to
compile
\ (x::Double) ->
let y = x*(x-1) in
let z = x/y + 1 in
if y <= 0 then 0 else 1/(y-z)
we get
__fun1:
subl $12, %esp
movsd LCPI1_0, %xmm0
movsd 16(%esp), %xmm1
movapd %xmm1, %xmm2
subsd %xmm0, %xmm2
mulsd %xmm1, %xmm2
pxor %xmm3, %xmm3
ucomisd %xmm2, %xmm3
jae LBB1_3
LBB1_1:
divsd %xmm2, %xmm1
addsd %xmm0, %xmm1
subsd %xmm1, %xmm2
movsd LCPI1_0, %xmm0
divsd %xmm2, %xmm0
LBB1_2:
movsd %xmm0, (%esp)
fldl (%esp)
addl $12, %esp
ret
LBB1_3:
pxor %xmm0, %xmm0
jmp LBB1_2
I love the Maybe (Equal a b) monadic syntax trick. Although it appears magic, here's how it works:
ReplyDeletedo {Eq <- expr ; ...}
desugars into
expr >>= \e -> case e of { Eq -> do {...} }
On the inside of the case, we have successfully pattern-matched on Eq :: Equal a b, so the type equality (a ~ b) is in scope; we can use the two types interchangeably! But the inside of the case is the entire rest of the "do" block, allowing "return Eq" to work.
Thanks so much for this extensive post! I was the fellow on #haskell :) I went another route (simpler, with much less type-safety), but this will definitely be on my mind when I go to refactor my existing stuff. Thanks again!
ReplyDeleteNice post, thanks for the insights!
ReplyDeleteThanks for posting this. It's nice to see such a complete example.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI was confused by:
ReplyDeleteBecause TTBol has type TTyp Bool, so both the type variables (a and b) must be TTBool in the first clause, which means that Eq :: Equal TBol TBol is what we're returning.
TTBool does not exist, and TBol is a data constructor, not a type. Did you mean that both type variables (a and b) must be Bool in the first clause, which means that Eq :: Equal Bool Bool is what we're returning?
TTBol exists and is a data constructor and it has type TTyp Bool.
ReplyDeleteI've fixed the typos.
ReplyDeleteI've fixed the typos.
ReplyDeleteI was bitten by LLVM curiosity and decided to try this out as a way to explore Haskell's LLVM bindings.
ReplyDeleteAll the code appears to compile on GHC 7.4.2, except for the test code as given at the top of the article.
I get:
Couldn't match type `Double' with `CodeGenFunction Double ()'
When using functional dependencies to combine
FunctionArgs
(IO Double) (llvm-3.0.1.0:LLVM.Core.CodeGen.FA Double) Double,
arising from the dependency `f -> g r'
in the instance declaration in `llvm-3.0.1.0:LLVM.Core.CodeGen'
FunctionArgs
(IO Double)
(CodeGenFunction Double ())
(CodeGenFunction Double ()),
arising from a use of `compile' at src/Compile.hs:144:18-24
In the expression:
compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"
In a pattern binding:
Just f = compile "\\ (x::Double) -> if x == 0 then 0 else 1/(x*x)"
At this stage my type-fu isn't strong enough for an 'ah-ha' just looking at this (along with my unfamiliarity of all the LLVM and compiler types here).
I'm hoping this isn't anything dumb on my part, but I don't think I've missed anything.
Any help greatly appreciated as I'd love to use this as a basis for experimentation. Thanks!