We’re happy to announce the first version of the Gearman rewrite in C, along with some interesting new MySQL UDFs based on the C library. Check out the Gearman wiki for an overview of what this is, download details, and API documentation.
For the anxious, here is a quick how-to:
Download: http://launchpad.net/gearmand/trunk/0.1/+download/gearmand-0.1.tar.gz tar xzf gearmand-0.1.tar.gz cd gearmand-0.1/ ./configure make make install
You should now have the job server (gearmand) installed in /usr/local/bin, along with headers in libraries in /usr/local/lib & include. You can now see it work by running some simple clients and workers in the examples/ directory:
gearmand & (this is assuming /usr/local/bin is in your path) cd examples ./reverse_worker & ./reverse_client "Hello, Gearman!"
If everything went well, your terminal should look something like this:
> gearmand & [1] 2270 > ./reverse_worker & [2] 2301 > ./reverse_client “Hello, Gearman!” Job=H:lap:1 Workload=Hello, Gearman! Result=!namraeG ,olleH Result=!namraeG ,olleH
What happened is the reverse_client program sent the string “Hello, Gearman!” to the job server (gearmand), telling it to run it with a function named “reverse”. The job server then found a worker is registered (out reverse_worker we started), and forwarded the job on to it. The reverse_worker program then reversed the string, and sent it back to the job server, which then sends it back to the original reverse_client. The “Job=…” line is the output from the reverse_worker, saying it received a job with the given workload, and sent a result back. The “Result=…” line is the final output from the reverse_client. You have now run your first Gearman job!
Before getting into the MySQL UDFs, lets look at how this could scale in a useful way. Imagine your reverse_client is actually an client inside of your PHP or Perl script running on your webserver. Next, your could have multiple job servers to spread the load and get redundancy, and then you could have a farm of machines running the workers. You now have a simple distributed computing framework to do time and/or resource intensive jobs like document or image conversion.
Now lets see how we can make MySQL run Gearman jobs:
Download: http://launchpad.net/gearman-mysql-udf/trunk/0.1/+download/gearman-mysql-udf-0.1.tar.gz tar xzf gearman-mysql-udf-0.1.tar.gz cd gearman-mysql-udf-0.1/ ./configure --with-mysql=/usr/local/mysql/bin/mysql_config --libdir=/usr/local/mysql/lib/plugin/ make make install
You may need to change your configure line depending on where you have MySQL installed (this is assuming /usr/local/mysql). Configure needs to know where the mysql_config tool is and where to install the plugin. The above assumes that your plugins should be in lib/plugin, but they may just be in lib. You will need to check your MySQL installation to see what paths you should use here.
Once you get the paths correct and everything installed, you can now load them through the MySQL command line tool:
CREATE FUNCTION gman_do RETURNS STRING
SONAME "libgearman_mysql_udf.so";
CREATE FUNCTION gman_do_high RETURNS STRING
SONAME "libgearman_mysql_udf.so";
CREATE FUNCTION gman_do_background RETURNS STRING
SONAME "libgearman_mysql_udf.so";
CREATE AGGREGATE FUNCTION gman_sum RETURNS INTEGER
SONAME "libgearman_mysql_udf.so";
CREATE FUNCTION gman_servers_set RETURNS STRING
SONAME "libgearman_mysql_udf.so";
If you get errors, the most likely case is that MySQL cannot find the module in the library path. Try a different libdir in the configure above and reinstall. Once these are loaded, you can now tell it where to find a job server and then run a Gearman job from a query. This test assumes you still have gearmand and reverse_worker running in the terminal from the previous example.
mysql> SELECT gman_servers_set("127.0.0.1");
+-------------------------------+
| gman_servers_set("127.0.0.1") |
+-------------------------------+
| NULL |
+-------------------------------+
1 row in set (0.00 sec)
mysql> SELECT gman_do("reverse", Host) AS test FROM mysql.user;
+-----------+
| test |
+-----------+
| 1.0.0.721 |
| pal |
| pal |
| tsohlacol |
| tsohlacol |
+-----------+
5 rows in set (0.00 sec)
Now you have a MySQL UDF that runs as a normal Gearman client! See the README on more information on how to run other types of jobs. I’ll be writing another blog entry shortly with a more interesting use case using these UDFs.
So what are the next steps for Gearman? We’ll be improving the server and clients a bit more, a new PHP extension based on the C library, more language wrappers using SWIG, persistent queues, queue replication, and also seeing how we’ll be able to plug this into other applications.
The gearmand job server written in Perl (current production server from Danga):
mysql> SELECT length(gman_do("reverse", repeat('x',10000000))) AS test;
+----------+
| test |
+----------+
| 10000000 |
+----------+
1 row in set (49.08 sec)
The new gearmand job server written in C:
mysql> SELECT length(gman_do("reverse", repeat('x',10000000))) AS test;
+----------+
| test |
+----------+
| 10000000 |
+----------+
1 row in set (0.30 sec)
Mmm, efficiency. Oh, and are those some new MySQL UDFs? Much more coming soon…
I spent this past week down in San Jose, CA at my employer’s office for team meetings and to officially kick-off my next big project. The design and architecture was very well received, and I drummed up some excitement with Gearman and working with the OSS community in general (which we’ve not done too much of in the past). We’ll be developing it entirely on Launchpad under GPLv2, and I’ll be writing a number of blog posts covering each component in detail. Why would anyone else find this interesting? It covers many topics of how to write a high-performance application in the cloud. Specific topics will include Gearman, persistent Gearman queues, eventual consistency data models (and related schemas), lightweight Map/Reduce for real-time applications, and how to combine all this with MySQL and/or Drizzle to build an e-mail storage infrastructure in the cloud. While the use case may be specific to e-mail, its concepts (and much of the code) will be easy to translate to a number of other applications. Also, there will be a number of other projects that will get spawned from this, such as a faster Gearman job server written in C (currently it is in Perl), generic persistent queuing system based on Gearman, and Gearman UDFs for MySQL and Drizzle (those are coming very soon).

This was the final result of a long white-boarding session. Now off to code it…
Like a few others I’ve seen this week, I had two proposals for the MySQL Conference & Expo 2009 accepted. I’m very excited for both topics (for different reasons), and will be blogging about each of them in more detail soon. They are:
Here you will find information about the best poker players around the world. Find everything you wanted to know about the champions of poker including John “The Razor” Phan, Daniel Negreanu, Annie Duke and more. http://www.championsofpoker.com/
world
annie
event,
poker,
players,
champions
duke
Main
Daniel
wsop
In case you don’t know, there is a trend on the internet known as Rickrolling:
Rickrolling is an Internet meme typically involving the music video for the 1987 Rick Astley song “Never Gonna Give You Up”. The meme is a bait and switch: a person provides a Web link they claim is relevant to the topic at hand, but the link actually takes the user to the Astley video.
When a person clicks on the link given and is led to the web page he/she is said to have been “Rickrolled” (also spelled Rickroll’d). By extension, it can also mean playing the song loudly in public in order to be disruptive
So there I was, watching the Macy’s parade. The dopey Cartoon Network float is rolling by, and in the midst of whatever song they were doing, the whole things stops and out pops Rick Astley. I have not laughed that hard in at least a year.
Last weekend I attended the OpenSQL Camp in Charlottesville, VA. There was a great turnout, and Baron did an excellent job organizing it! I saw a few folks I met at OSCON over the summer, along with meeting many new people. What a great group - intelligent, fun, and know how to get things done. I had some great conversations, especially with Brian, Stewart, Arjen, Patrick, Mark, and Jay. The food was great too, I was a bit worried about finding vegan food there. Oh, and there was the wine bar, and my new found love for dessert wine. Yum.
All the sessions I attended were great! Postgres MVCC by Greg Sabino Mullane, Sphinx by Peter Zaitsev, MySQL Self Monitoring Replication by Giuseppe Maxia, Postgres Extensions by Kelly McDonald, Google Proto Buffers by Jay Pipes, OurDelta by Arjen Lentz, and Join-Fu by Jay Pipes. The hackathon on Sunday was fun, many more good conversations and project planning for Drizzle and Gearman.
I had planned to give just one presentation on libdrizzle (slides), but ended up giving another with Brian on gearman (slides). Excuse the gearman slides, they’re a bit weak, but in our defense we threw them together 15 minutes before the talk (it was proposed only a few hours before). There were people really enthusiastic about both talks, and I received some great feedback for libdrizzle.
I’m continuing with the Drizzle and Gearman development with all the spare time I can find, and making good progress on both. I’m in the processing of Doxygen-izing both projects, and plan to have some code for people test really soon!
So, I installed one of those math question anti-spam plugins for WordPress comments. It stopped the spam! Unfortunately it also stopped all valid comments when the answer was right, although it did work when I first set it up. So if you tried leaving a comment (I see a few were denied) for one of the recent Drizzle or MySQL related posts (like the New libdrizzle), please try again. :)
Anyone have a suggestion for a good anti-spam plugin for WordPress comments? I’ve looked a bit but nothing really stands out.
What’s the new libdrizzle? It’s a complete rewrite of the client library for the Drizzle project, but it also has full support for the current MySQL protocol (4.1+). Right now Drizzle uses the same protocol as MySQL, but work is being done to design a new, more robust protocol. Even when the Drizzle protocol changes, I plan to keep full support for the MySQL protocol since there is a need for a good low-level non-blocking client library. Also, once libdrizzle turns into a full protocol library (server and client packets) it could make for some interesting Drizzle/MySQL hybrid proxies.
Over the weekend I made a lot of progress on the new library, and last night I just finished up the first pass at the core functionality. It can now do full query and result processing. Currently only non-cached results are being used (no store_result() function), but the cached interfaces will be easy to add in. The nice thing about the non-cached results is that besides a few core structs, no malloc()s are happening. This can be very useful for applications that just need to stream data and don’t want to allocate a huge chunk for each BLOB.
Here are two tests using the new library and example client, each opening up 50 connections (-c 50 option) to the server and running a query. The first uses non-blocking I/O and allows the connections to run concurrently, while the second uses blocking I/O (-b option) and performs each connection/query serially.
lap> time ./client -h host -u sql -p pass -c 50 "SELECT * FROM mysql.user" > /dev/null 0.020u 0.024s 0:00.30 13.3% 0+0k 0+0io 0pf+0w lap> time ./client -h host -u sql -p pass -c 50 -b "SELECT * FROM mysql.user" > /dev/null 0.020u 0.012s 0:09.70 0.3% 0+0k 0+0io 0pf+0w
Now, this test doesn’t show too much besides the amount of network latency between my laptop and the server, but it does show how the non-blocking interface allows a single thread to run concurrent connections with relatively little overhead. Now, for more practical usage, imagine a PHP script that needs to make multiple, time consuming queries. If they are all happening in parallel, your total page response time is only as long as the most time consuming query (plus a little overhead).
Next steps:
After an amazing summer of getting married, graduating college, adopting a new dog, and taking a much needed extended vacation, I’m now back on track with where I left off. I’ve been making good progress on the asynchronous MySQL library I talked about in this post in the form of a new drizzle client library. This library is also compatible with MySQL since they share the same protocol, and if drizzle changes in the future I plan on supporting current and new MySQL protocols as well. All of the connection and I/O overhead is mostly done, and I’m just working through the protocol bits now. I’m hoping to have something ready to show and talk about for the OpenSQL camp.
In other news, I’ve recently started working with the gearman project. I proposed building upon it for a distributed cloud application framework at my job and they were excited. I’ve started contributing to Brian Aker’s C implementation to get it production ready, and will also be adding some new features like persistent message queues and better large query support. After wrapping some other things up I’m going to start working on gearman full-time, including creating documentation for the project on a wiki I setup here.
Oh, and following up from my post about Mac’s, I’ve jumped ship. With the exception of an iPod, I’m now free of Apple and quite happy being back on a Linux desktop. I ended up switching to because the Slackware package system just doesn’t compete (sorry Slackware, it’s been fun).