Percona is looking to hire someone to develop Maatkit, among other things.
If I weren’t having so much fun being the consulting team lead, I’d be doing it myself. (In fact, I’m still hacking on it a lot. Got some pretty fun stuff done this weekend.) I don’t know what the rest of the world thinks, but I think Maatkit is a damn enjoyable project to work on. Hopefully someone else will have the same kind of mindset and want to get paid for it, unlike poor working-on-the-weekends me.
I’m not stepping away from the project. It’s just grown a lot, and there is room and money to grow it much more. This is actually the best compliment to the project: that it is worth hiring someone to keep improving it. Lots of people are using it, and there’s a lot of stealth-mode stuff I/we want to do with it too.
On a related note, who wants me to order another batch of Maatkit t-shirts? I’ve gotten quite a few questions about it.
No TagsMy post on what it’s like to write a technical book was a stream-of-consciousness look at the process of writing High Performance MySQL, Second Edition. I got a lot of responses from it and learned some neat things I wouldn’t have learned if I hadn’t written the post. I also got a lot of questions, and my editor wrote a response too. I want to follow up on these things.
I really intended to write the post as just “here’s what it’s like, just so you’re prepared.” But at some point I got really deep into it and lost my context. That’s when I started to write about the things that didn’t go so smoothly with the publisher, and some of these things had a little extra sting in them that I would have done well to edit out.
All of us are human and the process wasn’t that bad, all things considered — the book was just a massive project that put huge demands on all of us and stressed everything from the capabilities of our chosen tools to our patience. As the editor points out in his response to my blog post, this is precisely why nobody else has ever been able to pull this off. This book stands head and shoulders above the crowd. It’s just hard to write, and very few people in the world actually have the knowledge to do it, much less the time, inclination, and ability.
Everything I said was (I believe) factual and correct, although as the editor points out there are different stories behind them. I also want to mention that I’d shared all those concerns with my editor; I avoid criticizing people behind their backs. In hindsight, throwing all of my concerns onto a blog post without warning isn’t the kind of thing I like to do either.
So I believe I was honest, but unfair to the editor. I’ve apologized to him. And by the way, yes I would work with him again, and I fully expect that it would be easier because I have learned more about the process.
I ran this post by my editor before publishing it.
Several people asked me to say more about my heuristics for improving the quality of the writing. I’ve already explained many of them, but here’s more:
The tools I used to find sentences and phrases that score badly on some readability metric were pretty helpful to me as I tightened the writing up more and more. Nobody has reviewed the book yet, but I think when they do, they’ll be unlikely to mention “oh, and by the way the writing is wonderfully compact!” If we pulled this off right, you won’t notice that the writing is clear and compact. Writing is like a stereo system: you’re supposed to hear the music, not the speakers.
Anyway, my point is that we expanded the first edition’s actual coverage many times over, and ended up with only 658 pages of actual material. So the writing is much more compressed, and to do that you have to find and eliminate confusing writing. Confusing writing usually means that the concepts don’t flow clearly, and it takes more words to say the same thing because you’re kind of bumbling about, gesturing at your meaning from several angles instead of saying it clearly just once.
Here’s how I analyzed each chapter:
As I wrote in my previous post, the analyzer uses a combination of readability metrics and “other stuff” to measure the badness of each sentence and paragraph. It aggregates sentences and paragraphs by the metrics. I calculated the number of words, percent of complex words, syllables per word, number of sentences, words per sentence, and a bunch of other things, as well as the standard readability metrics. Each sentence and paragraph got scored on these. Then I printed overall metrics, and sorted the sentences and paragraphs worst-first and printed out a snippet of the offending text. Here’s a sample of chapter 3’s metrics (originally numbered chapter 4) at some intermediate stage in the writing process.
This was a lot of work. If I had been writing with Vim, I could have done better. I could have used the compiler integration and set my “make” program to the analysis program. If you use Vim and you don’t know about this, it’s a pity. My next book will be written in Vim, by the way.
Actually, I probably could have done better regardless, but this was good enough. I just searched for the snippets and then examined what was going on.
There were some false positives. For example, bullet-points often scored badly on the readability metrics, and so a five-word bullet point item would look like terrible writing just because it was short enough that it had a high percentage of complex words. It’s not an exact science. Maybe next time will be better.
If you’d like to see the source code, here’s the clean_text.pl and here’s the analyze_text.pl. Enjoy!
Perl, writingI have been using Maatkit in a different way since I joined Percona as a consultant. When I’m working on a system now, it’s a new, unfamiliar system — not one where I have already installed my favorite programs. And that means I want to grab my favorite productivity tools fast.
I intentionally wrote the Maatkit tools so they don’t need to be “installed.” You just run them, that’s all. But I never made them easy to download.
I fixed that. Now, at the command line, you can just run this:
wget http://www.maatkit.org/get/mk-table-sync
Now it’s ready to run. Behind the scenes are some Apache mod_rewrite rules, a Perl script or two, and Subversion. When you do this, you’re getting the latest code from Subversion’s trunk.[1][2] (I like to run on the bleeding edge. Releases are for people who want to install stuff.)
Because there’s some Perl magic behind it, I made it even easier — it does pattern-matching on partial names and Does The Right Thing:
baron@kanga:~$ wget http://www.maatkit.org/get/sync
--21:38:50-- http://www.maatkit.org/get/sync
=> `sync'
Resolving www.maatkit.org... 64.130.10.15
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 302 Moved
Location: http://www.maatkit.org/get/mk-table-sync [following]
--21:38:50-- http://www.maatkit.org/get/mk-table-sync
=> `mk-table-sync'
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-perl]
[ <=> ] 163,259 136.51K/s
21:38:51 (136.13 KB/s) - `mk-table-sync' saved [163259]
The redirection is there because otherwise wget will save the file under the name ’sync’ instead of ‘mk-table-sync’.
And if you’ve forgotten which tools exist, you can just click on over to http://www.maatkit.org/get/ and see.
A quick poll: instead of getting the latest trunk, should this give you the code from the last release? I can do that, if you want.
[1] OK, it’s only refreshed every hour. So you’re getting code that’s up to an hour old.
[2] update: now /get/foo gets the latest release, and /trunk/foo gets the latest trunk code.
Apache, mod rewrite, SubversionUpdate: Karanbir says “Just one thing to keep in mind is that we dont want too many people using it from the Testing repository - we only need enough feedback to move it from testing to stable ( and to be honest, there are already 8 people who have said yes it works - so move to stable should happen within the next 24 - 48 hrs ). Once the package is in stable, users on CentOS4 and 5 wont need to do anything more than just ‘yum install maatkit’ and it will install for them.”
At least one person (Karanbir Singh) is working to get Maatkit into the CentOS repositories, and I believe there might be movement towards RHEL also. From an email to the Maatkit discussion list a little while ago,
I am in the process of getting maatkit into the CentOS-Extras repositories. The first step for that is that every package needs to go into a CentOS-Testing repo and feedback is required from the project and users on its stability / usability and packaging quality.
maatkit-1887 is now available in the CentOS-Testing[*] repo’s and as soon as we can get some feedback ( needs to be 5 different people, none of whom can be CentOS Developers ) - the packages will move into the main repository so that all users can get access.
I’d appreciate it if people on this were able to give those packages a go and let me know if there are any issues. You can leave feedback :
- via the maatkit-discuss mailing list (http://sourceforge.net/mailarchive/forum.php?forum_name=maatkit-discuss)
- on the centos-devel list ( http://lists.centos.org ) or
- http://bugs.centos.org/ against category ‘maatkit’
[*] : Info about the Testing repo and howto set it up on your machine : http://wiki.centos.org/Repositories
If you’re interested in getting Maatkit into these repositories, please take a moment and give the requested feedback. I can’t do it because it would be a conflict of interest for the main developer to assert that the code is stable and usable.
CentOS, Karanbir Singh, RHEL
It’s been a while since I had chance to work on Informeer as my itch was one of multi-user web based password management (AuthStor). Oh and moving house.
Now that things are settling down again (Servers back up and running) I decided to take a break from AuthStor and focus on something new - Informeer.
The concept is simple, Centralised Notification.
I am forever configuring notifications from several sources, be it backup alerts, host monitoring notification and even simple applications that send mail via SMTP. When living in a world of change, both software and business, having to visit every application to change an e-mail address or add a new user to a notification schedule can be quite time consuming. Add to that the effort of having to modify firewalls, SMTP servers and XMPP settings etc etc. The idea of a single web interface where all your notifications can be configured is quite appealing. Informeer aims to make that a reality (one day).
While I take my time with the implementation I thought I would post a basic intro to RPC-XML using Perl. The RPC-XML Perl module makes writing your own client/server application a piece of cake, and while not quite point and click, it will give you enough flexibility to centralise your own notifications (or anything else for that matter). It’s not rocket science, but it works!
If you have been looking for a flexible quick and easy fix to the centralised notification problem, or even if you are just looking to push some data from a firewalled site to one of your servers This Tutorial should help get you started.
In between moving house and playing with notifications I found time to upgrade to Wordpress 2.5 and MoinMoin 1.6, and wow what an improvement in both camps.
The Wordpress dashboard is the most visible improvement - it’s amazingly clean! If you have been waiting for 2.5 to settle down or just cautious about upgrading, I would say take the plunge, you won’t regret it. As for MoinMoin 1.6, there really is no better Wiki engine out there (my opinion) and it just keeps getting better with every release!.
That just leaves me to post a quick MySQL tip that may well be obvious, but happens to catch me out from time to time.
If you ever get the urge to convert a 1Gig MyISAM table (with 34 million rows) to InnoDB within MySQL, try to avoid the simple ALTER TABLE tablename ENGINE=INNODB; method.
I made the mistake of running that command on a fairly decent server with bags of space and memory only to find the command running over 5 hours later with an InnoDB tablespace at around 4Gig (ok a bit of tuning might have helped)
Your best bet is to dump the table using mysqldump, drop the existing table and re-import the table after modifying the definition to be InnoDB. That worked for me in less than 10 minutes. Alternatively you can create a new InnoDB table and insert directly from the existing table as per MySQL documentation e.g INSERT INTO newtable SELECT * FROM oldtable.
I’m sure I am not the first to make this mistake, and may not be the last…..
Download MySQL Cacti templates
As promised, I’ve created some improved software for monitoring MySQL via Cacti. I began using the de facto MySQL Cacti templates a while ago, but found some things I needed to improve about them. As time passed, I rewrote everything from scratch. The resulting templates are much improved.
You can grab the templates by browsing the source repository on the project’s homepage.
In no particular order, here are some things I improved:
Cacti templates are very laborious to create if they’re complex at all; it takes a long time and is very error-prone. Instead of doing it through Cacti’s web interface and exporting a huge XML file, I eliminated the redundancies and created a small, easy-to-maintain file from which I generate the XML template with a Perl script. This gives the added benefit of letting me (or you) generate templates with different parameters such as polling interval or graph size. The README file has the full details. However, I’ve pre-generated a set of templates that matches Cacti’s defaults, so you can probably just use that.
This has taken a lot of time. In particular, I spent a lot of time working on it at my former employer, The Rimm-Kaufman Group (kudos to them for letting me open-source the work) and I just spent most of my weekend writing the scripts to convert from the compact format to XML templates, so it’s possible to maintain these beasts. Plus I had to develop the compact format, too. This took a lot of time because I had to understand the Cacti data model, which is pretty complex.
Please enter issue reports for bugs, feature requests, etc at the Google project homepage, not in the comments of this blog post. I do not look through comments on my blog when I’m trying to remember what I should be working on for a software project.
If these templates help you and you feel like visiting my Amazon.com wishlist and sending something my way, I’d appreciate it!
PS: You may also be interested in Alexey Kovyrin’s list of templates for monitoring servers.
Alexey Kovyrin, Cacti, Cacti templates, graphing, monitoring, mysql, Rimm Kaufman GroupI am happy to announce the release of mylvmbackup version 0.8. mylvmbackup is a tool for quickly creating backups of a MySQL server's data files. To perform a backup, mylvmbackup obtains a read lock on all tables and flushes all server caches to disk, makes an LVM snapshot of the volume containing the MySQL data directory, and unlocks the tables again. The snapshot process takes only a small amount of time. When it is done, the server can continue normal operations, while the actual file backup proceeds.
Below is the list of changes since version 0.6. You may wonder what happened to version 0.7 - it had a rather short life cycle as I was informed about a bug that I fixed quickly before I made a wider release announcement of 0.7.
Updated package are available from the home page and via the openSUSE Build Service as usual. Updated packages for Debian/Ubuntu and Gentoo Linux should also be available shortly. Enjoy!
Speaking of LVM snapshot backups: I will be giving a talk about this subject at our MySQL Conference 2008 in Santa Clara, CA next week. If you are curious about how MySQL can be backed up using this technology, please consider to stop by!
Maatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.
This release contains major bug fixes and new features. Some of the changes are not backwards-compatible. It also contains new tools to help you discover replication slaves and move them around the replication hierarchy.
Changelog for mk-archiver: 2008-03-16: version 1.0.8 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * Changed short form of --analyze to -Z to avoid conflict with --charset. Changelog for mk-deadlock-logger: 2008-03-16: version 1.0.9 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). Changelog for mk-duplicate-key-checker: 2008-03-16: version 1.1.5 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-find: 2008-03-16: version 0.9.10 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-heartbeat: 2008-03-16: version 1.0.8 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-parallel-dump: 2008-03-16: version 1.0.7 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * A global database connection was re-used by children, causing a hang. Changelog for mk-parallel-restore: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Changed --charset to be compatible with other tools (bug #1877548). Changelog for mk-query-profiler: 2008-03-16: version 1.1.9 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-show-grants: 2008-03-16: version 1.0.9 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-slave-delay: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). Changelog for mk-slave-find: 2008-03-16: version 1.0.0 * Initial release. Changelog for mk-slave-move: 2008-03-16: version 0.9.0 * Initial release. Changelog for mk-slave-prefetch: 2008-03-16: version 1.0.1 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-slave-restart: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * Added logic to repair tables, and rewrote a lot of code. * Added --always option, disabled by default. Not backwards compatible. * --daemonize did not work. * --quiet caused an undefined variable error. Changelog for mk-table-checksum: 2008-03-16: version 1.1.26 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). * Added --unique option to mk-checksum-filter. * The exit status from mk-checksum-filter was always 0. * mk-table-checksum now prefers to discover slaves via SHOW PROCESSLIST. Changelog for mk-table-sync: 2008-03-16: version 1.0.6 * --chunksize was not being converted to rowcount (bug #1902341). * Added --setvars option (bug #1904689, bug #1911371). * Deprecated the --utf8 option in favor of the A part in DSNs. * Mixed-case identifiers caused case-sensitivity issues (bug #1910276). * Prefer SHOW PROCESSLIST when looking for slaves of a server. Changelog for mk-visual-explain: 2008-03-16: version 1.0.7 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548).
|
mysql> CREATE TABLE t1 (grp INT, a DOUBLE);
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO t1 VALUES (1,1), (2,2), (2,3), (3,4), (3,5), (3,6);
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> CREATE FUNCTION test.agg_result(value DOUBLE, grp INT) RETURNS CHAR(128)
-> LANGUAGE Perl NO SQL EXTERNAL NAME 'Foo::aggregate_result';
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE FUNCTION test.agg_add(value DOUBLE, grp INT) RETURNS DOUBLE
-> LANGUAGE Perl NO SQL EXTERNAL NAME 'Foo::aggregate_add';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT COUNT(a),
-> CAST(AVG(a) AS DECIMAL(7,3)) 'AVG',
-> CAST(VARIANCE(a) AS DECIMAL(7,3)) 'VAR',
-> CAST(STD(a) AS DECIMAL(7,3)) 'STD',
-> test.agg_result(MAX(test.agg_add(a,grp)),grp) 'TEST'
-> FROM t1 GROUP BY grp;
+----------+-------+-------+-------+---------------------------------------+
| COUNT(a) | AVG | VAR | STD | TEST |
+----------+-------+-------+-------+---------------------------------------+
| 1 | 1.000 | 0.000 | 0.000 | count=1 avg=1.000 var=0.000 std=0.000 |
| 2 | 2.500 | 0.250 | 0.500 | count=2 avg=2.500 var=0.250 std=0.500 |
| 3 | 5.000 | 0.667 | 0.816 | count=3 avg=5.000 var=0.667 std=0.816 |
+----------+-------+-------+-------+---------------------------------------+
3 rows in set (0.04 sec)
This release contains minor bug fixes and new features. Besides the little bug fixes, there's a fun new feature in mk-heartbeat: it can auto-discover slaves recursively, and show the replication delay on all of them, to wit:
baron@keywest ~ $ mk-heartbeat --check --host master -D rkdb --recurse 10 master 0 slave1 1 slave2 1 slave3 4
(Not actual results. Your mileage may vary. Closed course, professional driver. Do not attempt).
Nothing else in this release is very exciting. I just wanted to get the bug fixes out there.
This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield's relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I'm releasing it as part of Maatkit.
I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That's what happens when I write quickly! This post explains what the heck I was talking about.
Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave's relay logs. I wrote an implementation of this, tried it on my workload, and it didn't speed them up. (I didn't expect it to; I don't have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn't going to benefit from the pre-fetching.
In the meantime, I've got a pretty sophisticated implementation of Paul's idea just sitting around, unused. I haven't released it for the same reasons Paul didn't release his: I'm afraid it might do more harm than good.
However, if you'd like the code, send me an email at [baron at this domain] and I'll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.
My posts lately have been mostly progress reports and release notices. That's because we're in the home stretch on the book, and I don't have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly.
This release contains bug fixes and new features. Click through to the full article for the details. I'll also write more about the changes in a separate article.