So the time is finally here. Qt 4.4.0 was released a few weeks ago and as promised Qt Jambi is right behind. A lot of effort has gone into this one, in addition to supporting all the new Qt features, like Phonon, Webkit, Widgets in Graphics View, XQuery and Qt Concurrent, we also have a seriously improved deployment system, JDBC support and a compile-time checked signal-slot approach for the paranoid. Its a good time to be a Java developer I tell yah! We already mentioned all the featuers in the Qt Jambi 4.4.0 Preview Blog so we won’t repeat ourselves here… (There is a danger in linking to eskils blog, as it links to others again, which again links to others, which in the end proves to be a fairly complex graph, but then again… we are engineers and like that kind of stuff)
Under the cover we’ve also done quite some work. We also did an overhaul of the garbage collection and memory management subsystem and hopefully ironed out all the bumps and dents. We’ve also done some work on the build system, so that our users that build from source have a bit more substantial buildsystem to work with. Previously it was a complex install document, which has been replaced by a simple ant command which just does it all… I was very happy to see that the deployment system & ANT build scripts works well enough for the webstart to look like a plain, normal webstart app:
<resources>
<j2se version="1.5+"/>
<jar href="qtjambi-examples-4.4.0_01.jar"/>
<jar href="qtjambi-4.4.0_01.jar"/>
</resources>
<resources os="Windows" arch="x86">
<j2se version="1.5+"/>
<jar href="qtjambi-win32-msvc2005-4.4.0_01.jar"/>
</resources>
No magic nativejar or anything like that, just the qtjambi-win32-msvc2005-4.4.0_01.jar in the classpath and that is enough to load it, jpeg and svg plugins and all. The good thing is that the files included in the webstart are produced directly by the ant script with all dependencies etc set up properly… (well… almost properly, it took us an evening last week to get it really working, but now it works properly). Because of the fixes to memory management and deployment Eskil and I got these offical diplomas:

So, what more is there to say… Try the webstart with its new demos, download the packages and start hacking!
-
The Qt Jambi Team
The Wide Finder Project is an informal parallel programming competition where the task is to compute web site statistcs from a 218-million line access log. Each entry will be benchmarked on a Sun T2000 with support for 32 hardware threads, giving lots of opportunities for parallel processing.
What makes this really interesting is that the project is not only about performance, but rather about writing code that scales to many CPU cores with as little extra programmer effort as possible. Some results are already in, with OCaml currently in the lead performance-wise.
Each log line looks something like this:
www.example.com - - [17/Jun/2007:21:37:17 -0700] “GET /ongoing/ongoing.atom HTTP/1.1″ 304 - - “-” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Eager to bring the performance lead back to C++ where it belongs I started out writing my own implementation using QtConcurrent and the other Qt APIs. Briefly explained, the code uses QtConcurrent::mappedReduced to multi-thread the code, and then QByteArray::split() twice to iterate over each word in each line. The current version computes the number of hits for each page.
Results: (parsing 100K lines on an 8-core 2.8 GHz Mac Pro)
1 Thread
real 0m2.283s
user 0m1.141s
sys 0m0.249s
2 Threads:
real 0m1.446s
user 0m1.853s
sys 0m0.271s
1.6X speedup.. not too bad.
4 Threads:
real 0m3.186s
user 0m10.643s
sys 0m0.407s
1 second slower that the single-threaded version.. this does not bode well.
8 Threads:
real 0m7.000s
user 0m46.922s
sys 0m0.724s
Seven seconds! We get a nice linear scaling of the run-time as we increase the number of threads, but unfortunately in the wrong direction. The program us spending a lot of user time doing something though, so let’s run it through Shark and see what’s going on:
80% in a spin-lock used by malloc/free. But who is calling malloc that much?
Aha.. QByteArray::split(). While being a very convenient API, split() was clearly not designed for heavy parsing like this. Still, I’d like a less catastrophic impact on the run-time when adding threads, even if the program really is calling malloc/free to often. Let’s try with the ptmalloc memory allocator instead:
8 Threads:
real 0m0.908s
user 0m3.784s
sys 0m0.533s
ptmalloc is used in GNU/Linux though the GNU C library and scales much better on multicore systems. The program itself still does not scale beyond 4 threads, but it does not get significantly worse either when adding threads. I guess it’s debatable whether or not this is qualifies as a bug in the Darwin memory allocator, but at least ptmalloc shows that it is possible to do better.
That’s all for now
For the next installment I’ll try to get better scaling, at the expense of increasing the developer effort.
opensource: del.icio.us tag/opensource
Framework
Java
Library
messaging
concurrent
message
opensource
opensource: del.icio.us tag/opensource
Framework
Java
Library
messaging
concurrent
message
opensource
Java
messaging
concurrent
message
erlang
concurrency
passing