Is Haskell fast?
Let's do a simple benchmark comparing Haskell to C. My benchmark computes an approximation to infinity by adding up 1/n. Here is the C code:
#include <stdio.h>
int
main(int argc, char **argv)
{
double i, s;
s = 0;
for (i = 1; i < 100000000; i++)
s += 1/i;
printf("Almost infinity is %g\n", s);
}
And running it
Lennarts-Computer% gcc -O3 inf.c -o inf
Lennarts-Computer% time ./inf
Almost infinity is 18.9979
1.585u 0.009s 0:01.62 97.5% 0+0k 0+0io 0pf+0w
And now the Haskell code:
import BASIC
main = runBASIC' $ do
10 LET I =: 1
20 LET S =: 0
30 LET S =: S + 1/I
40 LET I =: I + 1
50 IF I <> 100000000 THEN 30
60 PRINT "Almost infinity is"
70 PRINT S
80 END
And running it:
Lennarts-Computer% ghc --make Main.hs
[4 of 4] Compiling Main ( Main.hs, Main.o )
Linking Main ...
Lennarts-Computer% ./Main
Almost infinity is
18.9979
CPU time: 1.57s
As you can see it's about the same time. In fact the assembly code for the loops look pretty much the same. Here's the Haskell one:
LBB1_1: ## _L4
movsd LCPI1_0, %xmm2
movapd %xmm1, %xmm3
addsd %xmm2, %xmm3
ucomisd LCPI1_1, %xmm3
divsd %xmm1, %xmm2
addsd %xmm2, %xmm0
movapd %xmm3, %xmm1
jne LBB1_1 ## _L4
And the assembly we get from GHC with just the obvious recursive implementation in Haskell,
ReplyDeletego:
comisd .LC0(%rip), %xmm5
jb .L4
movapd %xmm6, %xmm5
jmp *(%rbp)
.L4:
movsd .LC1(%rip), %xmm0
movapd %xmm0, %xmm7
divsd %xmm5, %xmm7
addsd %xmm0, %xmm5
addsd %xmm7, %xmm6
jmp go
Not too bad, but the jump in the middle is a bit awkward.
Playing around with the C example a bit, switching to int from double changes the speeds drastically.
ReplyDeleteAny idea why that happens?
Jonathan: it doesn't really do anything interesting with int instead of double.
ReplyDeletePlaying around with the C example a bit, putting a return at the beginning of main changes the speeds drastically.
ReplyDeleteAny idea why that happens?
I truly am sorry, but that was just far too tempting.
@Lennart
ReplyDeleteNo doubt, it makes the benchmark nonsensical. I more meant "what happens with a similarly styled benchmark that uses ints instead of doubles?". I didn't know if one possibility was that gcc was particularly good at optimizing integer operations (for whatever reason)
@David
Nice.
What is behind the BASIC module? Or is it a secret? :)
ReplyDeleteYou can get a little bit more speed out of the c code by rewriting the loop as follows:
ReplyDeletefor (i = 0; i < 100000000;)
s += 1/++i;
"Labels: BASIC, ..." I am looking forward to many more posts labeled BASIC.
ReplyDeleteJonathan: gcc optimized the double and int loops in a very similar way (as you can see from looking at the assembly code), it's just that int operations are much faster.
ReplyDeleteDavid: Yes, moving the increment gives about 1-2% improvment, probably because the ucomisd instruction can be moved two instructions before the ja, so there's less of a pipeline bubble when the jump happens.
ReplyDeleteHere's what I got for the C code:
ReplyDeleteL2:
movapd %xmm3, %xmm0
divsd %xmm1, %xmm0
addsd %xmm0, %xmm2
addsd %xmm3, %xmm1
ucomisd %xmm1, %xmm4
ja L2
If I make i an int (idiom) and use "s += 1.0/i" I get a minor speed-up (about 2%). The assembly now looks like:
L2:
cvtsi2sd %eax, %xmm0
movapd %xmm2, %xmm3
divsd %xmm0, %xmm3
addsd %xmm3, %xmm1
addl $1, %eax
cmpl $500000000, %eax
jne L2
Anyway, that's some pretty decent code generation for Haskell-BASIC.