Things that amuse me

Friday, February 06, 2009

Is Haskell fast?

Let's do a simple benchmark comparing Haskell to C. My benchmark computes an approximation to infinity by adding up 1/n. Here is the C code:

#include <stdio.h>

int
main(int argc, char **argv)
{
  double i, s;
  s = 0;
  for (i = 1; i < 100000000; i++)
    s += 1/i;
  printf("Almost infinity is %g\n", s);
}

And running it

Lennarts-Computer% gcc -O3 inf.c -o inf
Lennarts-Computer% time ./inf
Almost infinity is 18.9979
1.585u 0.009s 0:01.62 97.5%     0+0k 0+0io 0pf+0w

And now the Haskell code:

import BASIC
main = runBASIC' $ do
    10 LET I =: 1
    20 LET S =: 0
    30 LET S =: S + 1/I
    40 LET I =: I + 1
    50 IF I <> 100000000 THEN 30
    60 PRINT "Almost infinity is"
    70 PRINT S
    80 END

And running it:

Lennarts-Computer% ghc --make Main.hs
[4 of 4] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
Lennarts-Computer% ./Main
Almost infinity is
18.9979
CPU time:   1.57s

As you can see it's about the same time. In fact the assembly code for the loops look pretty much the same. Here's the Haskell one:

LBB1_1: ## _L4
        movsd   LCPI1_0, %xmm2
        movapd  %xmm1, %xmm3
        addsd   %xmm2, %xmm3
        ucomisd LCPI1_1, %xmm3
        divsd   %xmm1, %xmm2
        addsd   %xmm2, %xmm0
        movapd  %xmm3, %xmm1
        jne     LBB1_1  ## _L4

11 comments:

Don StewartFriday, February 6, 2009 at 7:09:00 PM GMT
And the assembly we get from GHC with just the obvious recursive implementation in Haskell,

go:
comisd .LC0(%rip), %xmm5
jb .L4
movapd %xmm6, %xmm5
jmp *(%rbp)
.L4:
movsd .LC1(%rip), %xmm0
movapd %xmm0, %xmm7
divsd %xmm5, %xmm7
addsd %xmm0, %xmm5
addsd %xmm7, %xmm6
jmp go

Not too bad, but the jump in the middle is a bit awkward.
ReplyDelete
Replies
Jonathan TurnerFriday, February 6, 2009 at 7:14:00 PM GMT
Playing around with the C example a bit, switching to int from double changes the speeds drastically.

Any idea why that happens?
ReplyDelete
Replies
augustssFriday, February 6, 2009 at 7:19:00 PM GMT
Jonathan: it doesn't really do anything interesting with int instead of double.
ReplyDelete
Replies
David DillonFriday, February 6, 2009 at 7:58:00 PM GMT
Playing around with the C example a bit, putting a return at the beginning of main changes the speeds drastically.

Any idea why that happens?

I truly am sorry, but that was just far too tempting.
ReplyDelete
Replies
Jonathan TurnerFriday, February 6, 2009 at 8:09:00 PM GMT
@Lennart

No doubt, it makes the benchmark nonsensical. I more meant "what happens with a similarly styled benchmark that uses ints instead of doubles?". I didn't know if one possibility was that gcc was particularly good at optimizing integer operations (for whatever reason)

@David

Nice.
ReplyDelete
Replies
UnknownFriday, February 6, 2009 at 11:42:00 PM GMT
What is behind the BASIC module? Or is it a secret? :)
ReplyDelete
Replies
David WalthourSaturday, February 7, 2009 at 3:17:00 AM GMT
You can get a little bit more speed out of the c code by rewriting the loop as follows:

for (i = 0; i < 100000000;)
s += 1/++i;
ReplyDelete
Replies
UnknownSaturday, February 7, 2009 at 4:32:00 AM GMT
"Labels: BASIC, ..." I am looking forward to many more posts labeled BASIC.
ReplyDelete
Replies
augustssSaturday, February 7, 2009 at 10:59:00 AM GMT
Jonathan: gcc optimized the double and int loops in a very similar way (as you can see from looking at the assembly code), it's just that int operations are much faster.
ReplyDelete
Replies
augustssSaturday, February 7, 2009 at 11:01:00 AM GMT
David: Yes, moving the increment gives about 1-2% improvment, probably because the ucomisd instruction can be moved two instructions before the ja, so there's less of a pipeline bubble when the jump happens.
ReplyDelete
Replies
jagSunday, February 8, 2009 at 12:56:00 PM GMT
Here's what I got for the C code:

L2:
movapd %xmm3, %xmm0
divsd %xmm1, %xmm0
addsd %xmm0, %xmm2
addsd %xmm3, %xmm1
ucomisd %xmm1, %xmm4
ja L2

If I make i an int (idiom) and use "s += 1.0/i" I get a minor speed-up (about 2%). The assembly now looks like:

L2:
cvtsi2sd %eax, %xmm0
movapd %xmm2, %xmm3
divsd %xmm0, %xmm3
addsd %xmm3, %xmm1
addl $1, %eax
cmpl $500000000, %eax
jne L2

Anyway, that's some pretty decent code generation for Haskell-BASIC.
ReplyDelete
Replies

Add comment