Tuesday, October 30, 2007

Profiling Ruby/C applications

As mentioned in a previous post, the current function for reading plain text datafiles (Dvector::fancy_read) is slow. Really slow. So I decided to switch to a C implementation keeping the same functionalities.

I quickly did a rough translation of the function into C, using basically the same mechanics (and in particular using Ruby regular expressions for parsing), and I was surprised when I found out that I was only winning a factor of around three in the speed of reading. I was even more surprised to see that the reading is O(n2) (reading 100000 lines is around 100 times slower than reading 100 times 1000 lines !). So, I decided I should try my luck with a profiler.

My first step was naively to compile the library with the -pg gcc option, but that didn't produce any output file (although it might have been due to the fact that I forgot to add the switch again for linking). I attributed that to the fact that the whole program should be compiled with the switch, and not only the shared library. So I did write a small C wrapper, compiled it, and ran it. It did produce a gmon.out file, but gprof was unable to give me any interesting information from that. I guess I needed a finer granularity that my own functions, and for that I should have compile Ruby with profiling support. Well. Drop it.

So, I was about to give up when I thought about valgrind. Valgrind also comes with a profiler tool, callgrind. So, I ended up doing the following:

~ valgrind --tool=callgrind ./fancy_read
~ callgrind_annotate callgrind.out.24425| less

The first command runs the program with valgrind, saving data into a file called something like callgrind.out.24425. The second parses this file and displays the number of intructions spent in each of the most significative functions. Here is a extract of the output:

306,777,466  ???:__printf_fp [/lib/libc-2.6.1.so]
226,575,122  ???:0x0000000000041C20 [/lib/libc-2.6.1.so]
121,900,026  ???:0x00000000000805F0 [/usr/lib/libruby1.8.so.1.8.6]
 98,208,889  ???:__strtod_internal [/lib/libc-2.6.1.so]
 90,449,474  ???:0x00000000000482D0 [/lib/libc-2.6.1.so]
 87,300,200  ???:vfprintf [/lib/libc-2.6.1.so]
 84,197,045  ???:ruby_re_search [/usr/lib/libruby1.8.so.1.8.6]
 71,866,550  ???:0x0000000000071D60 [/lib/libc-2.6.1.so]
 65,387,136  ???:0x0000000000071380 [/lib/libc-2.6.1.so]
 62,784,967  ???:0x000000000004A9F0'2 [/usr/lib/libruby1.8.so.1.8.6]
 61,279,027  ???:0x000000000004AC80 [/usr/lib/libruby1.8.so.1.8.6]
 48,357,879  ???:malloc [/lib/libc-2.6.1.so]
 41,915,985  ???:free [/lib/libc-2.6.1.so]
 34,200,000  ???:rb_reg_search [/usr/lib/libruby1.8.so.1.8.6]
 31,432,232  ???:ruby_xmalloc [/usr/lib/libruby1.8.so.1.8.6]

This shows that most of the time is spent displaying the data. Normal, half of my program does only that. Then, a fair amount of time is spent in strtod. Nothing to improve there. Another fair amount is spent in regular expression matching, and then, a significant part of the processing time is actually spent on memory management ! Dreadful ! I guess there's not much more I could do

The conclusion to this is that if you need to profile something, use valgrind. This is much more powerful than gprof !

1 comment:

Waffle Recipes said...

Thank you ffor sharing