Benchmarking ray tracing, Haskell vs. OCaml
On his web site about OCaml Jon Harrop has a benchmark for a simple ray tracing program written in a number of languages. When I saw it I wondered why Haskell was doing so badly, especially since this benchmark was taken as some kind of proof that Haskell performs poorly in "real life". So I have rerun the benchmarks. I've also rewritten the Haskell versions of the programs. The Haskell versions on Jon's web site were OK, but they were too far from the OCaml versions for my taste. I prefer to keep the programs very similar in a situation like this. My rewrite of the benchmarks from OCaml to Haskell was done with a minimum of intelligence. Here are the only things I did that needed creative thought:- Use the type Vector to represent vectors instead of a tuple. This allows the components to be strict.
- Use the type Scene instead of a tuple to represent a scene. The tuple used in the OCaml code uses the dubious feature of equi-recursive types (even Xavier thinks it's strange enough to have a flag to enable it).
- Rewrite the loop that computes a pixel's value using an accumulating updatable variable into a list comprehension that sums the list.
- Finally, the compiler flags needed a bit of tweaking to get good performance, even though "-O3 -funbox-strict-fields -fexcess-precision -optc-ffast-math" were pretty obvious.
To make the program size comparison fair I removed some dead code from the OCaml code.
I then reran the benchmarks using Haskell, OCaml, and C++.
The benchmarks are five programs that starts from very simple ray tracing and specializing the program more and more to speed it up.
The numbers are in the tables below. The time is execution time in second, the characters are non-white characters in the file, and the lines are the number of lines in the file. To ease comparison I also include the relative numbers compared to OCaml (smaller numbers are better).
Interestingly, and unlike Jon's benchmark, the Haskell code is always smaller than the OCaml code. Furthermore, the Haskell code ranges from much faster to slightly faster than the OCaml code. Again, this is very unlike Jon's benchmark. I find the unoptimized version of the benchmark especially interesting since Haskell is 5 times(!) faster than OCaml. I've not investigated why, but I suspect laziness.
Results
The programs, ray1-ray5, are variations on the ray tracer as given on Jon's web site. I've used the same size metrics as Jon does.- Haskell: My Haskell code compiled with ghc 6.8.1
- Haskell old: Jon's Haskell code, compiled with ghc 6.8.1
- Haskell old 6.6: Jon's Haskell code, compiled with ghc 6.1.1
- OCaml: Jon's OCaml code
- C++: Jon's C++ code
- Time: execution time is second
- Char: number of non-white chracters in the program
- Lines: number of lines in the program
- Rel T: execution time relative to OCaml
- Rel C: non-white characters relative to OCaml
- Rel L: lines relative to OCaml
- Mem: Maximum resident memory
ray1 | Time | Chars | Lines | Rel T | Rel C | Rel L | Mem |
---|---|---|---|---|---|---|---|
Haskell | 15.3 | 1275 | 51 | 0.202 | 0.990 | 1.020 | 5M |
Haskell, old | 15.8 | 1946 | 88 | 0.208 | 1.511 | 1.760 | 9M |
Haskell, old 6.6 | 28.1 | 1946 | 88 | 0.370 | 1.511 | 1.760 | 9M |
OCaml | 75.9 | 1288 | 50 | 1.000 | 1.000 | 1.000 | 18M |
C++ | 8.1 | 2633 | 122 | 0.106 | 2.044 | 2.440 | 8M |
ray2 | Time | Chars | Lines | Rel T | Rel C | Rel L | Mem |
---|---|---|---|---|---|---|---|
Haskell | 11.5 | 1457 | 50 | 0.206 | 0.912 | 0.943 | 12M |
Haskell, old | 12.0 | 2173 | 99 | 0.215 | 1.360 | 1.868 | 35M |
Haskell, old 6.6 | 21.1 | 2173 | 99 | 0.379 | 1.360 | 1.868 | 35M |
OCaml | 55.8 | 1598 | 53 | 1.000 | 1.000 | 1.000 | 15M |
C++ | 6.1 | 3032 | 115 | 0.108 | 1.897 | 2.170 | 8M |
ray3 | Time | Chars | Lines | Rel T | Rel C | Rel L | Mem |
---|---|---|---|---|---|---|---|
Haskell | 9.7 | 1794 | 62 | 0.970 | 0.919 | 0.939 | 12M |
Haskell, old | 11.1 | 2312 | 103 | 1.112 | 1.184 | 1.561 | 35M |
Haskell, old 6.6 | 19.7 | 2312 | 103 | 1.984 | 1.184 | 1.561 | 35M |
OCaml | 10.0 | 1953 | 66 | 1.000 | 1.000 | 1.000 | 15M |
C++ | 5.4 | 3306 | 143 | 0.545 | 1.693 | 2.167 | 8M |
ray4 | Time | Chars | Lines | Rel T | Rel C | Rel L | Mem |
---|---|---|---|---|---|---|---|
Haskell | 8.5 | 1772 | 66 | 0.985 | 0.867 | 0.957 | 12M |
Haskell, old | 11.7 | 2387 | 110 | 1.360 | 1.168 | 1.594 | 36M |
Haskell, old 6.6 | 19.2 | 2387 | 110 | 2.235 | 1.168 | 1.594 | 35M |
OCaml | 8.6 | 2043 | 69 | 1.000 | 1.000 | 1.000 | 11M |
C++ | 5.0 | 3348 | 149 | 0.584 | 1.639 | 2.159 | 8M |
ray5 | Time | Chars | Lines | Rel T | Rel C | Rel L |
---|---|---|---|---|---|---|
Haskell | 7.0 | 2246 | 95 | 0.999 | 0.878 | 0.950 |
OCaml | 7.0 | 2559 | 100 | 1.000 | 1.000 | 1.000 |
C++ | 4.7 | 3579 | 142 | 0.674 | 1.399 | 1.420 |
The source code is available in a Darcs repository.
Software and hardware details
Hardware: MacBook, Intel Core Duo 2GHz, 2MB L2 Cache, 1GB 667MHz DRAMSoftware:
- Haskell compiler: ghc-6.8.1
- OCaml compiler: 3.10.0
- g++: gcc version 4.0.1 (Apple Computer, Inc. build 5367)
- ghc: -O3 -fvia-C -funbox-strict-fields -optc-O3 -fexcess-precision -optc-ffast-math -funfolding-keeness-factor=10
- OCaml: -rectypes -inline 100 -ffast-math -ccopt -O3
- g++: -O3 -ffast-math
Some observations
Haskell should really have the definitions of infinity and epsilon_float in a library. They are quite useful. Also, having them in a library would have made the Haskell code somewhat shorter and faster.Converting these programs from OCaml to Haskell was very mechanical; it could almost be done with just sed.
I'm glad version 5 of the benchmark didn't show much improvement, because it's a really ugly rewrite. :)
Note that the code is all Haskell98, no strange extensions (even though -funbox-strict-fields deviates subtly from H98).
Conclusion
Benchmarking is tricky. I'm not sure why my and Jon's numbers are so different. Different hardware, slightly different programs, different software.Haskell is doing just fine against OCaml on this benchmark; the Haskell programs are always smaller and faster.
Edit: Updated tables with more numbers.
PS: Phil Armstrong wrote the Haskell code on Jon's web site and I took some code from his original.
One of my first thoughts when reading this is that you aren't just comparing implementations of the ray tracer, you are also comparing the compilers and (possibly) platforms involved. Harrop used GHC 6.6.1 and didn't list what kind of system it was run on.
ReplyDeleteAre your results due to the rewrites you have done, improvements to GHC, or somewhere in between? I'd say probably the last one, but it is hard to conclude for certain. It also is somewhat troubling that you are comparing your modified version of the Haskell and OCaml code. It might be worthwhile, at least for the sake of forstalling arguments, to run another benchmark against the original OCaml code and see how it fares on your system.
Either way, it is good to see that Haskell is more than competitive.
Great work!
ReplyDeleteDid you look at the memory footprint of the executables?
I've never really had too many worries about the speed of Haskell for ray-tracing, and a 2x drop in performance might be acceptable to many people if it makes code development easier. The thing I'm interested in is how performance degrades as you start stressing your machine out. How do you think the various languages will perform when you have, say, 2Gb of RAM, your scene takes up 1.9GB, and the renders take hours?
Just wanted to say, I agree with Adam's post. Would be nice to see numbers from the original code in the same tables as the rest.
ReplyDeleteI'll try to run the original Haskell programs, and also look at memory foot print.
ReplyDeleteYou use -O3, however -O3 is clamped to be just -O2 in GHC 6.8, but in 6.6 and earlier, -O3 is closer to -Onot, if I recall (as it wasn't maintained, so only some optimisations are enabled). -O3 might therefore be worse than -O2 for the older compilers.
ReplyDeleteIn the shootout we typically use only: -O2 -optc-O2 -optc-march=pentium4 -optc-ffast-math -funbox-strict-fields, so you might double check using -O2 -optc-O2
On the website, it says Philip Armstrong wrote the Haskell code, not Jon Harrop.
ReplyDeleteGreat stuff. I think it shows again that (1) you need to get an expert of each language to get fair benchmarks and (2) we need more material on how to write efficient Haskell programs (on the other hand, that you should use datatypes with strict fields in such situations is described in GHC's User Guide).
ReplyDeleteQuick note: the -ccopt -O3 passed to ocamlopt isn't actually doing anything; -ccopt is only used when compiling C extensions. O'Caml programs themselves are compiled directly to assembly.
ReplyDeleteYeah, I wrote the Haskell version that appears on Jon's webpages. I don't pretend to be an expert Haskell programmer; indeed if you look back in the archives of the haskell-cafe mailing list you'll find a sizable thread on the topic of optimising my version: http://www.haskell.org/pipermail/haskell-cafe/2007-June/027235.html
ReplyDeleteIt's nice to see that GHC 6.8 makes such a big difference to my original code.
From what I remember there are *significant* differences between the 64-bit and 32-bit x86 binaries produced by both the OCaml compiler and GHC, so you need to watch out for that when making performance comparisons (I think Jon is comparing 64-bit versions but ICBW).
As far as the code goes, I had considered making Vector a Num, but decided against it on personal taste grounds. I'm not a big fan of big list comprehensions either, but perhaps you guessed that...
Phil, I'm sorry I forgot to mention you as the original author. I stole code from you after all. :)
ReplyDeleteIt's not clear to me if Jon is comparing 32 or 64 bit versons. If it's 64 bits it could account for the huge differences.
Apology accepted :)
ReplyDeleteI'll probably merge a few bits of yours into my version just for completeness' sake.
I think that Jon is reporting 64-bit timings IIRC.
Your results look similar to the old 32-bit results on my site (the ones in red), which is not surprising if you are still running your 64-bit CPU in legacy mode.
ReplyDeleteWhat results do you get in 64-bit mode?
I don't know what I'd get in 64 bit mode because I don't know how to generate 64 bit binaries on my Mac.
ReplyDeleteBut I know native code generation for 64 bits is not that good for ghc yet.
I'm not sure running in 64 bit mode is always the best (in any language). For this kind of benchmark we're after fastest execution after all, and if you get that in 32 bit mode, fine by me.
But I'll investigate 64 bit mode.
In OCaml and C++, 64-bit is much faster than 32-bit for almost all tasks.
ReplyDeleteLooks like you need the latest version of Mac OS to get 64-bit. I'm quite surprised by that because Linux has been 64-bit for several years now. I had AMD64 Debian Linux on my old laptop, for example.
Anyway, our results agree for your legacy system. On my 64-bit machine, you Haskell is almost twice as fast as the current implementations and shorter to boot. That makes Haskell much more competitive!
So I shall update my pages with your new implementations (and attribution) ASAP. Many thanks.
Would you please explain how you picked the compilation flags?
ReplyDeleteIt's interesting how different figures might be on different platforms. I've just compiled and run ray5 on a 32-bit Windows box (core2duo 2GHz) with GHC 6.8.3 and OCaml 3.10.2 with flags from your post. I've got 29 seconds for Haskell and 9 seconds for OCaml.
ReplyDeleteSeems like your OCaml installation is screwed. Or did you run bytecode variant? ;)
We should parallelize this benchmark in both languages.
ReplyDeleteAnd what about optimisations in the OCaml code ?
ReplyDeleteHas it been natively compiled ?
Is there some use of laziness etc in the OCaml code, as it is possible in OCaml too.
Etc...
@Alp Mestan
ReplyDeleteThe OCaml code is optimized and natively compiled. Laziness is of little benefit in this benchmark. The latest code and results are available on our site although the discussion is out of date.
Ah, Jon, the notorious troll. Amusing to see him trolling Haskell before he switched to F# and added Caml to his hit list.
ReplyDelete