Tuesday, June 26, 2007

A simple compiler In my last post I played a little with Harpy to generate code from what looked like assembly language. So the next logical step is to make a compiler. And I mean a real machine-code compiler, not some wimpy byte-code nonsense. To make life simple (it is a simple compiler, after all) I'm going to start by generating code that uses the stack all the time. So all operands will live on the stack and all operations on them will push and pop the stack. Of course, the code generated from such a compiler will make any serious compiler writer weep. The CodeGen type in Harpy is the monad used during code generation. It allows you to keep some extra information around besides the generated code; it gives you access to a state monad. I will use the state to keep track of the current stack depth. (We will see why soon.)
type StackDepth = Int
type Gen = CodeGen () StackDepth ()

addDepth :: StackDepth -> Gen
addDepth i = do
    d <- getState
    setState (d+i)
The addDepth function changes the current stack depth by grabbing the old one, adding the argument and storing it back. The getState and setState functions don't generate any code, they just manipulate the state available in the CodeGen monad. With that out of the way, let's implement code generation for addition.
gadd :: Gen
gadd = do
    pop  ebx
    pop  eax
    add  eax ebx
    push eax
    addDepth (-1)
It pops the two operands off the stack, adds them, and pushes the result. The net effect on the stack is that it has one word less on it (I count my stack depth in words), so there's also a call to addDepth. Subtraction and multiplication are very similar.
gsub :: Gen
gsub = do
    pop  ebx
    pop  eax
    sub  eax ebx
    push eax
    addDepth (-1)

gmul :: Gen
gmul = do
    pop  ebx
    pop  eax
    imul InPlace eax ebx
    push eax
    addDepth (-1)
On the i386 (signed) division and remainder is computed with the idiv instruction. It divides the EDX:EAX 64 bit number with the given operand. So we must convert a 32 signed number to a 64 bit signed number, this is simply done by moving EAX to EDX and then shifting right 31 steps. This will copy the sign bit into every bit of EDX. Depending of if we want the quotient or remainder we need to push EAX or EDX.
gquot :: Gen
gquot = do
    gquotRem
    push eax
    addDepth (-1)

grem :: Gen
grem = do
    gquotRem
    push edx
    addDepth (-1)

gquotRem :: Gen
gquotRem = do
    pop  ebx
    pop  eax
    mov  edx eax
    sar  edx (31 :: Word8)
    idiv ebx
To put a constant on the stack we simply push it and increment the remembered stack depth.
gconst :: Int -> Gen
gconst c = do
    push (fromIntegral c :: Word32)
    addDepth 1
OK, so now for something more interesting. Assuming we are generating code for a function, we also want to access the arguments to the function. Where are the arguments? Well, according to the IA32 calling conventions the caller pushes the arguments on the stack, so we'll follow those. First we have a bunch things on the stack, how many is kept track of in the stack depth in the CodeGen monad, and then the arguments follow in order, pushed right-to-left. So to get an argument we compute the offset, convert it to a byte offset, and push that word on the stack.
-- Get the Nth argument
gargN :: Int -> Gen
gargN n = do
    d <- getState
    let o = 4 * (d + n)
    mov  eax (Disp (fromIntegral o), esp)
    push eax
    addDepth 1
When generating code for a function we should not clobber any callee-save registers, so to be on the safe side we save all used registers on function entry and restore them on function exit. On function exit we also return the result in EAX.
savedRegs = [ebx, edx]

-- Push all register we'll be using, except eax.
gprologue :: Gen
gprologue = do
    mapM_ push savedRegs

-- Pop return value to eax, restore regs, and return
gepilogue :: Gen
gepilogue = do
    pop  eax
    mapM_ pop (reverse savedRegs)
    ret
OK, so that was a lot of stuff, let's put it together for a test.
testGen = conv_Word32ToWord32 () (length savedRegs + 1) $ do
    gprologue
    gargN 0
    gconst 1
    gadd
    gepilogue

main = do
    test <- testGen
    print (test 10)
The testGen function generates the prologue, push argument, push 1, add, and the epilogue. The conv_Word32ToWord32 (from my previous post) converts the machine code to a Haskell function. We also have to give the start value of the stack depth. The stack initially contains the return address and the saved registers, so that's the number we pass. Running this gives 11, as it should. OK, so let's actually write a compiler and not just a code generator. Here is a data type for integer expressions.
data Exp
    = Con Int
    | Arg Int
    | BinOp BOp Exp Exp
    deriving (Show)

data BOp = Add | Sub | Mul | Quot | Rem
    deriving (Show)
We have constants, arguments (variables), and a few binary operators. It's all easily translated to machine code.
translate :: Exp -> Gen
translate (Con c) = gconst c
translate (Arg n) = gargN n
translate (BinOp op x y) = do translate x; translate y; binop op
  where binop Add = gadd
        binop Sub = gsub
        binop Mul = gmul
        binop Quot = gquot
        binop Rem = grem
For simplicity, let's compile only functions of one argument for now.
compileIOW32 :: (Exp -> Exp) -> IO (Word32 -> Word32)
compileIOW32 f = conv_Word32ToWord32 () (length savedRegs + 1) $ do
    gprologue
    translate (f (Arg 0))
    gepilogue 
This function takes an Exp->Exp function, by giving this function the argument Arg 0 we get an expression to translate. We tack on the usual prologue and epilogue. So let's try it.
main = do
    let fun x = BinOp Add x (Con 1)
    test <- compileIOW32 fun
    print (test 10)
Which prints 11. But yuck, writing BinOp etc. isn't nice. Let's make some instances.
instance Eq Exp
instance Ord Exp

instance Num Exp where
    x + y  =  BinOp Add x y
    x - y  =  BinOp Sub x y
    x * y  =  BinOp Mul x y
    fromInteger i = Con (fromInteger i)

instance Enum Exp
instance Real Exp

instance Integral Exp where
    quot x y = BinOp Quot x y
    rem x y = BinOp Rem x y
And give it a whirl:
main = do
    let fun x = (x+1) * x `quot` 2
    test <- compileIOW32 fun
    print (test 10)
And this prints 55, as expected. Not bad, a compiler from (a tiny subset of) Haskell functions to machine code in a few pages. But I do admit being embarrassed about generating such incredibly poor code. But there's always future blog posts to rectify that.

4 comments:

  1. Seeing compilers and language tools written in Haskell always makes me jealous. I got to write some parsers and an interpreter in Haskell in school but I haven't touched the language since. Now I work on Eclipse which is all Java. The sheer amount of code required to do this stuff in Java is mind-boggling. The visitor pattern alone on our AST requires hundreds of lines of boilerplate code. I miss Haskell, its such an elegant language.

    ReplyDelete
  2. But you can't get anything done in Haskell. You can't talk to Oracle on AIX or parse HTML created in 1994 or some shit, which is crucial for every single application any software company ever creates, especially compilers.

    Besides, REFACTORING! Writing thousands of lines of boilerplate code is fantastic because of refactoring. Now shut up and like it, it's the enterprise standard!

    ReplyDelete
  3. @warren : Yea, but that's not really Haskell's fault is it?

    I guess if you could gather 30 or so programmers, with enough language knowledge and some free time you could get interfaces to whatever you wanted.

    Same goes for Lisp and all those other nice languages people hear about only in tales.

    Too bad most programmers are haters, and don't contribute to libraries for struggling languages :|

    ReplyDelete
  4. This is so deja vu for me, because in my Programming Languages class at UCSB, we wrote a compiler in Haskell. It compiled from a fake Pascal-like language into a stack-based language. Similar to what you are doing above, however I couldn't grok monads so I didn't use them.

    Still want to go back and learn those tricky "monad" thingies.

    ReplyDelete