Friday, February 06, 2009

Is Haskell fast?

Let's do a simple benchmark comparing Haskell to C. My benchmark computes an approximation to infinity by adding up 1/n. Here is the C code:
#include <stdio.h>

int
main(int argc, char **argv)
{
  double i, s;
  s = 0;
  for (i = 1; i < 100000000; i++)
    s += 1/i;
  printf("Almost infinity is %g\n", s);
}
And running it
Lennarts-Computer% gcc -O3 inf.c -o inf
Lennarts-Computer% time ./inf
Almost infinity is 18.9979
1.585u 0.009s 0:01.62 97.5%     0+0k 0+0io 0pf+0w
And now the Haskell code:
import BASIC
main = runBASIC' $ do
    10 LET I =: 1
    20 LET S =: 0
    30 LET S =: S + 1/I
    40 LET I =: I + 1
    50 IF I <> 100000000 THEN 30
    60 PRINT "Almost infinity is"
    70 PRINT S
    80 END
And running it:
Lennarts-Computer% ghc --make Main.hs
[4 of 4] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
Lennarts-Computer% ./Main
Almost infinity is
18.9979
CPU time:   1.57s
As you can see it's about the same time. In fact the assembly code for the loops look pretty much the same. Here's the Haskell one:
LBB1_1: ## _L4
        movsd   LCPI1_0, %xmm2
        movapd  %xmm1, %xmm3
        addsd   %xmm2, %xmm3
        ucomisd LCPI1_1, %xmm3
        divsd   %xmm1, %xmm2
        addsd   %xmm2, %xmm0
        movapd  %xmm3, %xmm1
        jne     LBB1_1  ## _L4

11 comments:

  1. And the assembly we get from GHC with just the obvious recursive implementation in Haskell,

    go:
    comisd .LC0(%rip), %xmm5
    jb .L4
    movapd %xmm6, %xmm5
    jmp *(%rbp)
    .L4:
    movsd .LC1(%rip), %xmm0
    movapd %xmm0, %xmm7
    divsd %xmm5, %xmm7
    addsd %xmm0, %xmm5
    addsd %xmm7, %xmm6
    jmp go

    Not too bad, but the jump in the middle is a bit awkward.

    ReplyDelete
  2. Playing around with the C example a bit, switching to int from double changes the speeds drastically.

    Any idea why that happens?

    ReplyDelete
  3. Jonathan: it doesn't really do anything interesting with int instead of double.

    ReplyDelete
  4. Playing around with the C example a bit, putting a return at the beginning of main changes the speeds drastically.

    Any idea why that happens?

    I truly am sorry, but that was just far too tempting.

    ReplyDelete
  5. @Lennart

    No doubt, it makes the benchmark nonsensical. I more meant "what happens with a similarly styled benchmark that uses ints instead of doubles?". I didn't know if one possibility was that gcc was particularly good at optimizing integer operations (for whatever reason)

    @David

    Nice.

    ReplyDelete
  6. What is behind the BASIC module? Or is it a secret? :)

    ReplyDelete
  7. You can get a little bit more speed out of the c code by rewriting the loop as follows:

    for (i = 0; i < 100000000;)
    s += 1/++i;

    ReplyDelete
  8. "Labels: BASIC, ..." I am looking forward to many more posts labeled BASIC.

    ReplyDelete
  9. Jonathan: gcc optimized the double and int loops in a very similar way (as you can see from looking at the assembly code), it's just that int operations are much faster.

    ReplyDelete
  10. David: Yes, moving the increment gives about 1-2% improvment, probably because the ucomisd instruction can be moved two instructions before the ja, so there's less of a pipeline bubble when the jump happens.

    ReplyDelete
  11. Here's what I got for the C code:

    L2:
    movapd %xmm3, %xmm0
    divsd %xmm1, %xmm0
    addsd %xmm0, %xmm2
    addsd %xmm3, %xmm1
    ucomisd %xmm1, %xmm4
    ja L2

    If I make i an int (idiom) and use "s += 1.0/i" I get a minor speed-up (about 2%). The assembly now looks like:

    L2:
    cvtsi2sd %eax, %xmm0
    movapd %xmm2, %xmm3
    divsd %xmm0, %xmm3
    addsd %xmm3, %xmm1
    addl $1, %eax
    cmpl $500000000, %eax
    jne L2


    Anyway, that's some pretty decent code generation for Haskell-BASIC.

    ReplyDelete