» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with Perl + SQL

Percona wants to hire a Maatkit developer

Percona is looking to hire someone to develop Maatkit, among other things.

If I weren’t having so much fun being the consulting team lead, I’d be doing it myself. (In fact, I’m still hacking on it a lot. Got some pretty fun stuff done this weekend.) I don’t know what the rest of the world thinks, but I think Maatkit is a damn enjoyable project to work on. Hopefully someone else will have the same kind of mindset and want to get paid for it, unlike poor working-on-the-weekends me.

I’m not stepping away from the project. It’s just grown a lot, and there is room and money to grow it much more. This is actually the best compliment to the project: that it is worth hiring someone to keep improving it. Lots of people are using it, and there’s a lot of stealth-mode stuff I/we want to do with it too.

On a related note, who wants me to order another batch of Maatkit t-shirts? I’ve gotten quite a few questions about it.

No Tags

MySQL: Planet MySQL

What it?s like to write a technical book, continued

My post on what it’s like to write a technical book was a stream-of-consciousness look at the process of writing High Performance MySQL, Second Edition. I got a lot of responses from it and learned some neat things I wouldn’t have learned if I hadn’t written the post. I also got a lot of questions, and my editor wrote a response too. I want to follow up on these things.

Was I fair, balanced and honest?

I really intended to write the post as just “here’s what it’s like, just so you’re prepared.” But at some point I got really deep into it and lost my context. That’s when I started to write about the things that didn’t go so smoothly with the publisher, and some of these things had a little extra sting in them that I would have done well to edit out.

All of us are human and the process wasn’t that bad, all things considered — the book was just a massive project that put huge demands on all of us and stressed everything from the capabilities of our chosen tools to our patience. As the editor points out in his response to my blog post, this is precisely why nobody else has ever been able to pull this off. This book stands head and shoulders above the crowd. It’s just hard to write, and very few people in the world actually have the knowledge to do it, much less the time, inclination, and ability.

Everything I said was (I believe) factual and correct, although as the editor points out there are different stories behind them. I also want to mention that I’d shared all those concerns with my editor; I avoid criticizing people behind their backs. In hindsight, throwing all of my concerns onto a blog post without warning isn’t the kind of thing I like to do either.

So I believe I was honest, but unfair to the editor. I’ve apologized to him. And by the way, yes I would work with him again, and I fully expect that it would be easier because I have learned more about the process.

I ran this post by my editor before publishing it.

A deeper explanation of my heuristics

Several people asked me to say more about my heuristics for improving the quality of the writing. I’ve already explained many of them, but here’s more:

(were|was|is|are|has been|be)( [a-zA-Z]+)? [a-zA-Z]+ed\>
This regular expression can help find some occurrences of passive voice. It finds a word or phrase that’s some variation on the verb “to be,” usually in the past tense; followed by an optional word (probably an adjective); followed by another word that ends in “-ed,” which is also potentially a verb in the past tense. This is not the only way to write in the passive voice, but it’s kind of the classic. Here are some examples: “the blog post was posted,” “the benchmark was rapidly created.”
(were|was|is|are|ha[sd] been|be)( [a-zA-Z]+)? [a-zA-Z]+e[dn]\>
An enhanced version. As I looked at the preceding point, I saw some other simple examples it doesn’t catch. For example, it doesn’t catch “had been” and it doesn’t catch verbs like “written.” Ironically, the first thing that came to mind as I thought about examples was “the book had been written.”
while|since
There’s nothing wrong with these words, except when they’re used in lieu of “because” to indicate causality. This is a problem for non-native English speakers, because these words have a temporal meaning too. For example, “Since MySQL 4.1 has no stored procedures, you have to use MySQL 5.0 if you want stored procedures.” If you aren’t a native English speaker, and even if you are, it’s easy to read that as “MySQL has had no stored procedures since version 4.1, …” and then when your eyes reach the part about MySQL 5.0, it makes no sense. My rule for this is to say “because” when I mean “because.”
using
Real examples: “Using MyISAM tables works very well” can become “MyISAM tables work very well.” And “A final possibility is simply to switch to using a table” can become “Finally, you can use a table” instead.
in order
The phrase “in order to” can almost always be replaced by “to.” It also tends to show a rough transition between the first and second phrases in a sentence. Perhaps these phrases should be integrated into a single phrase. “You can use this regex in order to find poorly constructed sentences” can become “this regex can find poorly constructed sentences” or “You can find poorly constructed sentences with this regex.” I prefer the latter; it is very direct, and that straightforward, simple writing style is really important in complex subject matter.
of course|without saying|obviously|clearly|needless
It goes without saying, but of course these words obviously point out when I’m writing stupid things that I clearly need to take a closer look at. Needless to say, most of the phrases in this paragraph are indeed needless to say. They are a red flag for lazy writing, such as glossing over a difficult point that should instead be explained — hard work, but necessary.
whether
I found quite a few places where the phrase “whether or not” was used. This can be shortened: “to see whether or not the disk is the problem” can become “to see whether the disk is the problem.” But better yet, the phrase often glues together poorly written phrases into an awkward sentence, just as “in order to” does. Can “whether” be replaced by “if?” Or does the sentence or paragraph just need to be reworked completely?
allow
This word can usually be replaced by “let.” “The remaining settings allow MySQL to allocate more RAM” can become “The remaining settings let MySQL allocate more RAM.” Occasionally, it is part of a larger phrase or thought needs to be shortened and clarified. “When nobody is writing, readers obtain read locks that allow other readers to do the same” became “When nobody is writing, readers can obtain read locks, which don’t conflict with other read locks.”

ensure
I found that this word is often subtly misused. It really means “guarantee” but is often used as “double-check” or “make sure.” I don’t want to be too dogmatic about this word: its usage in modern English is complex (see the usage note on assure here; that in itself might be a reason to avoid it). But I found many places where I wanted to remove it in favor of an explicit instruction that tells the reader to take action. “Ensure” as an instruction is kind of a politically correct way to tell someone to do something, and I’m not afraid to just tell you to do it if I think you need to. I don’t want you to miss my meaning.
only
I have a habit of using this word incorrectly. “I only have ten fingers” should be “I have only ten fingers.”
as (we|you)|again,
These phrases usually show a place where the writing is confused and redundant. They show up in places like “as we already said, you should tune your server” and “again, you should tune your server.” Any instruction to the reader to break the narrative flow is a place to examine whether the concepts are in the right order. Cross-references, footnotes, and reminders are not always evil, but they’re to be regarded with suspicion.

Readability metrics

The tools I used to find sentences and phrases that score badly on some readability metric were pretty helpful to me as I tightened the writing up more and more. Nobody has reviewed the book yet, but I think when they do, they’ll be unlikely to mention “oh, and by the way the writing is wonderfully compact!” If we pulled this off right, you won’t notice that the writing is clear and compact. Writing is like a stereo system: you’re supposed to hear the music, not the speakers.

Anyway, my point is that we expanded the first edition’s actual coverage many times over, and ended up with only 658 pages of actual material. So the writing is much more compressed, and to do that you have to find and eliminate confusing writing. Confusing writing usually means that the concepts don’t flow clearly, and it takes more words to say the same thing because you’re kind of bumbling about, gesturing at your meaning from several angles instead of saying it clearly just once.

Here’s how I analyzed each chapter:

  • I used OpenOffice’s export feature to export the file to MediaWiki format. This is a plain-text markup format. I forget now why I didn’t just export to text, but there was something about MediaWiki format that made it easier to munge with Perl.
  • I ran my clean_text.pl program against the exported file to convert the format to a simpler one without special characters and markup. Some of the markup (footnotes, for example) stayed in the text and confused the metrics, but that’s life.
  • I ran my analyze_text.pl program against this to find the “worst” places.

As I wrote in my previous post, the analyzer uses a combination of readability metrics and “other stuff” to measure the badness of each sentence and paragraph. It aggregates sentences and paragraphs by the metrics. I calculated the number of words, percent of complex words, syllables per word, number of sentences, words per sentence, and a bunch of other things, as well as the standard readability metrics. Each sentence and paragraph got scored on these. Then I printed overall metrics, and sorted the sentences and paragraphs worst-first and printed out a snippet of the offending text. Here’s a sample of chapter 3’s metrics (originally numbered chapter 4) at some intermediate stage in the writing process.

This was a lot of work. If I had been writing with Vim, I could have done better. I could have used the compiler integration and set my “make” program to the analysis program. If you use Vim and you don’t know about this, it’s a pity. My next book will be written in Vim, by the way.

Actually, I probably could have done better regardless, but this was good enough. I just searched for the snippets and then examined what was going on.

There were some false positives. For example, bullet-points often scored badly on the readability metrics, and so a five-word bullet point item would look like terrible writing just because it was short enough that it had a high percentage of complex words. It’s not an exact science. Maybe next time will be better.

If you’d like to see the source code, here’s the clean_text.pl and here’s the analyze_text.pl. Enjoy!

,

MySQL: Planet MySQL

Get Maatkit fast from the command line

I have been using Maatkit in a different way since I joined Percona as a consultant. When I’m working on a system now, it’s a new, unfamiliar system — not one where I have already installed my favorite programs. And that means I want to grab my favorite productivity tools fast.

I intentionally wrote the Maatkit tools so they don’t need to be “installed.” You just run them, that’s all. But I never made them easy to download.

I fixed that. Now, at the command line, you can just run this:

wget http://www.maatkit.org/get/mk-table-sync

Now it’s ready to run. Behind the scenes are some Apache mod_rewrite rules, a Perl script or two, and Subversion. When you do this, you’re getting the latest code from Subversion’s trunk.[1][2] (I like to run on the bleeding edge. Releases are for people who want to install stuff.)

Because there’s some Perl magic behind it, I made it even easier — it does pattern-matching on partial names and Does The Right Thing:

baron@kanga:~$ wget http://www.maatkit.org/get/sync
--21:38:50--  http://www.maatkit.org/get/sync
           => `sync'
Resolving www.maatkit.org... 64.130.10.15
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 302 Moved
Location: http://www.maatkit.org/get/mk-table-sync [following]
--21:38:50--  http://www.maatkit.org/get/mk-table-sync
           => `mk-table-sync'
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-perl]

    [      <=>                            ] 163,259      136.51K/s             

21:38:51 (136.13 KB/s) - `mk-table-sync' saved [163259]

The redirection is there because otherwise wget will save the file under the name ’sync’ instead of ‘mk-table-sync’.

And if you’ve forgotten which tools exist, you can just click on over to http://www.maatkit.org/get/ and see.

A quick poll: instead of getting the latest trunk, should this give you the code from the last release? I can do that, if you want.

[1] OK, it’s only refreshed every hour. So you’re getting code that’s up to an hour old.

[2] update: now /get/foo gets the latest release, and /trunk/foo gets the latest trunk code.

, ,

MySQL: Planet MySQL

AutoDia - Automatic Dia / UML generator

Create Dia (or GraphViz, or dot, or VCG) UML Diagrams from Source code, directories of source code or even xml, by using your own or included Handlers.

UML: del.icio.us tag/uml

Maatkit in RHEL and CentOS

Update: Karanbir says “Just one thing to keep in mind is that we dont want too many people using it from the Testing repository - we only need enough feedback to move it from testing to stable ( and to be honest, there are already 8 people who have said yes it works - so move to stable should happen within the next 24 - 48 hrs ). Once the package is in stable, users on CentOS4 and 5 wont need to do anything more than just ‘yum install maatkit’ and it will install for them.”

At least one person (Karanbir Singh) is working to get Maatkit into the CentOS repositories, and I believe there might be movement towards RHEL also. From an email to the Maatkit discussion list a little while ago,

I am in the process of getting maatkit into the CentOS-Extras repositories. The first step for that is that every package needs to go into a CentOS-Testing repo and feedback is required from the project and users on its stability / usability and packaging quality.

maatkit-1887 is now available in the CentOS-Testing[*] repo’s and as soon as we can get some feedback ( needs to be 5 different people, none of whom can be CentOS Developers ) - the packages will move into the main repository so that all users can get access.

I’d appreciate it if people on this were able to give those packages a go and let me know if there are any issues. You can leave feedback :

  • via the maatkit-discuss mailing list (http://sourceforge.net/mailarchive/forum.php?forum_name=maatkit-discuss)
  • on the centos-devel list ( http://lists.centos.org ) or
  • http://bugs.centos.org/ against category ‘maatkit’

[*] : Info about the Testing repo and howto set it up on your machine : http://wiki.centos.org/Repositories

If you’re interested in getting Maatkit into these repositories, please take a moment and give the requested feedback. I can’t do it because it would be a conflict of interest for the main developer to assert that the code is stable and usable.

, ,

MySQL: Planet MySQL

Improved Cacti monitoring templates for MySQL

Download MySQL Cacti templates

As promised, I’ve created some improved software for monitoring MySQL via Cacti. I began using the de facto MySQL Cacti templates a while ago, but found some things I needed to improve about them. As time passed, I rewrote everything from scratch. The resulting templates are much improved.

You can grab the templates by browsing the source repository on the project’s homepage.

In no particular order, here are some things I improved:

  • Standard polling interval and graph size by default.
  • Full captions on every graph; you don’t have to guess at how big the values are. Each graph has current, max, and average values printed at the bottom for every value on it.
  • Much more data is captured. I’ve graphed almost everything I could think of.
  • The graphs are grouped better. Most graphs have only related values. There are some exceptions, but not many.
  • The templates don’t hijack your existing installation. They don’t depend on or alter anything in your default Cacti installation.
  • The script that gathers the data is totally rewritten from scratch, and much improved. For example, the math works on 32-bit systems. It has caching built-in so each poll cycle results in just one request to the server, instead of one request per graph. (This is a weakness of Cacti I’m trying to work around). It also has debugging aids and other good coding stuff.
  • By default, it assumes you have the same username and password across every server you’re monitoring, so you don’t have to fill in a username and password for every single graph you create.
  • One data template == one graph template. This helps work around another Cacti limitation.
  • Lots more. Honestly I can’t really remember everything I’ve done. I’m sure you’ll help me remember by asking me how to get X feature working the way you want, and I’ll go “oh, yeah, that’s another thing I improved…”

Cacti templates are very laborious to create if they’re complex at all; it takes a long time and is very error-prone. Instead of doing it through Cacti’s web interface and exporting a huge XML file, I eliminated the redundancies and created a small, easy-to-maintain file from which I generate the XML template with a Perl script. This gives the added benefit of letting me (or you) generate templates with different parameters such as polling interval or graph size. The README file has the full details. However, I’ve pre-generated a set of templates that matches Cacti’s defaults, so you can probably just use that.

This has taken a lot of time. In particular, I spent a lot of time working on it at my former employer, The Rimm-Kaufman Group (kudos to them for letting me open-source the work) and I just spent most of my weekend writing the scripts to convert from the compact format to XML templates, so it’s possible to maintain these beasts. Plus I had to develop the compact format, too. This took a lot of time because I had to understand the Cacti data model, which is pretty complex.

Please enter issue reports for bugs, feature requests, etc at the Google project homepage, not in the comments of this blog post. I do not look through comments on my blog when I’m trying to remember what I should be working on for a software project.

If these templates help you and you feel like visiting my Amazon.com wishlist and sending something my way, I’d appreciate it!

PS: You may also be interested in Alexey Kovyrin’s list of templates for monitoring servers.

, , , , , ,

MySQL: Planet MySQL

Maatkit version 1877 released

Download Maatkit

Maatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.

This release contains major bug fixes and new features. Some of the changes are not backwards-compatible. It also contains new tools to help you discover replication slaves and move them around the replication hierarchy.

Changelog for mk-archiver:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Changed short form of --analyze to -Z to avoid conflict with --charset.

Changelog for mk-deadlock-logger:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-duplicate-key-checker:

2008-03-16: version 1.1.5

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-find:

2008-03-16: version 0.9.10

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-heartbeat:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-parallel-dump:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * A global database connection was re-used by children, causing a hang.

Changelog for mk-parallel-restore:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Changed --charset to be compatible with other tools (bug #1877548).

Changelog for mk-query-profiler:

2008-03-16: version 1.1.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-show-grants:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-delay:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-slave-find:

2008-03-16: version 1.0.0

   * Initial release.

Changelog for mk-slave-move:

2008-03-16: version 0.9.0

   * Initial release.

Changelog for mk-slave-prefetch:

2008-03-16: version 1.0.1

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-restart:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Added logic to repair tables, and rewrote a lot of code.
   * Added --always option, disabled by default.  Not backwards compatible.
   * --daemonize did not work.
   * --quiet caused an undefined variable error.

Changelog for mk-table-checksum:

2008-03-16: version 1.1.26

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).
   * Added --unique option to mk-checksum-filter.
   * The exit status from mk-checksum-filter was always 0.
   * mk-table-checksum now prefers to discover slaves via SHOW PROCESSLIST.

Changelog for mk-table-sync:

2008-03-16: version 1.0.6

   * --chunksize was not being converted to rowcount (bug #1902341).
   * Added --setvars option (bug #1904689, bug #1911371).
   * Deprecated the --utf8 option in favor of the A part in DSNs.
   * Mixed-case identifiers caused case-sensitivity issues (bug #1910276).
   * Prefer SHOW PROCESSLIST when looking for slaves of a server.

Changelog for mk-visual-explain:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

MySQL: Planet MySQL

Maatkit version 1753 released

Download Maatkit

This release contains minor bug fixes and new features. Besides the little bug fixes, there's a fun new feature in mk-heartbeat: it can auto-discover slaves recursively, and show the replication delay on all of them, to wit:

baron@keywest ~ $ mk-heartbeat --check --host master -D rkdb --recurse 10
master 0
slave1 1
slave2 1
slave3 4

(Not actual results. Your mileage may vary. Closed course, professional driver. Do not attempt).

Nothing else in this release is very exciting. I just wanted to get the bug fixes out there.

MySQL: Planet MySQL

Maatkit version 1709 released

This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield's relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I'm releasing it as part of Maatkit.

MySQL: Planet MySQL

How pre-fetching relay logs speeds up MySQL replication slaves

I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That's what happens when I write quickly! This post explains what the heck I was talking about.

MySQL: Planet MySQL

Speed up your MySQL replication slaves

Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave's relay logs. I wrote an implementation of this, tried it on my workload, and it didn't speed them up. (I didn't expect it to; I don't have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn't going to benefit from the pre-fetching.

In the meantime, I've got a pretty sophisticated implementation of Paul's idea just sitting around, unused. I haven't released it for the same reasons Paul didn't release his: I'm afraid it might do more harm than good.

However, if you'd like the code, send me an email at [baron at this domain] and I'll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.

MySQL: Planet MySQL

What is new in Maatkit

My posts lately have been mostly progress reports and release notices. That's because we're in the home stretch on the book, and I don't have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly.

MySQL: Planet MySQL

Maatkit version 1674 released

This release contains bug fixes and new features. Click through to the full article for the details. I'll also write more about the changes in a separate article.

MySQL: Planet MySQL

Maatkit version 1579 released

This release contains bug fixes and new features. The biggest new feature, in my opinion, is a new sync algorithm for mk-table-sync. Now you can sync any table with an index more efficiently than previously. This is the return of the speed I promised earlier. (Though I haven't yet benchmarked it; I am very short on time these days. Your benchmarks and other contributions are welcome).

I'm finally feeling like the table sync tool is getting in good shape!

Changelog etc is in the full article.

MySQL: Planet MySQL

Maatkit version 1508 released

This release fixes a few bugs, adds minor features, and adds some debugging support to shared code. I'm working on the Nibble sync algorithm for mk-table-sync, and someone has found a few more bugs with mk-parallel-dump, but those might take me a while to complete.

MySQL: Planet MySQL

Maatkit on Ohloh

Sheeri wrote a post (now a 404 error) referring to Maatkit on Ohloh, which I have never heard of before. I took a look at what Ohloh thinks about Maatkit. It's kind of neat. Beyond just the obvious "social website" stuff that's all the rage these days, it actually looks at the project's SVN history, analyzes the codebase, and so on.

It also estimates 8 person-years of work have gone into the project, and says that at $55,000/year it would cost $450,702 to write the code as it currently exists, which is kind of funny. It took me a whole lot less than 8 years to write. (Perhaps this is why that salary strikes me as unrealistic).

It has a couple of other interesting things, like a visual timeline of source control commits, analysis of licenses it finds in the code, analysis of programming languages, and so on. Really pretty neat overall.

MySQL: Planet MySQL

Maatkit version 1417 released

Download Maatkit

Thanks again to all the great sponsors for my week of work on the kit!

This is the long-awaited "Baron worked on table sync" release. Hooray!

Please read the full blog post for important (very important!) information.

MySQL: Planet MySQL

Progress on Maatkit bounty, part 4

... I didn't get two-way sync done, and I didn't get the Nibble algorithm done. That much I expected. But I also didn't get the current work released tonight because I'm paranoid about breaking things. I'm trying to go through all the tools and write at least a basic test for them to be sure they can do the simplest "unit of work" (such as mk-find running and printing out that it finds the mysql.columns_priv table).

It's good that I'm doing this. I found that mk-heartbeat suddenly doesn't work on my Ubuntu 7.10 laptop. It goes into infinite sleep. Can anyone repro this and/or diagnose? The same code works fine on my Gentoo servers at work.

Hopefully I'll be able to release something very soon. Release early/often is fine, but "knowingly release brokenness" isn't in my code of conduct :)

MySQL: Planet MySQL

Progress on Maatkit bounty, part 3

This is the last day I'm taking off work to hack on mk-table-sync, and I thought it was time for (yet another) progress report. Here's what I have done so far. (Click through to the full article to read the details).

MySQL: Planet MySQL

Progress on Maatkit bounty, part 2

Ironically, the Stream algorithm I wrote as the simplest possible syncing algorithm does what the much more efficient algorithm I wrote some time ago can't do: sync a table without a primary key, as long as there are no duplicate rows. In fact, it's so dumb, it will happily sync any table, even if there are no indexes.

The flash of inspiration I had on Friday has turned out to be good...

MySQL: Planet MySQL

Progress on Maatkit bounty

My initial plans got waylaid! I didn't pull out the checksumming code first, because the code wasn't at all as I remembered it. Instead, I began writing code to handle the more abstract problem of accepting two sets of rows, finding the differences, and doing something with them. I'm ending up with a little more complicated system than I thought I would. However, it's also significantly simpler in some ways. Instead of just passing references to subroutines to use as callbacks, I'm object-ifying the entire synchronization concept...

MySQL: Planet MySQL

Maatkit bounty begins tomorrow

Tomorrow is the first of five days I will spend working on mk-table-sync, the data synchronization tool I developed as part of Maatkit. The first thing I’ll do is pull the row-checksumming code out into a module and write a unit test suite for it. I’ll probably add the code to the module [...]

MySQL: Planet MySQL

Maatkit version 1314 released

Maatkit (formerly MySQL Toolkit) contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.

This release fixes several minor bugs. It also renames all the tools to avoid trademark violation.

MySQL: Planet MySQL

Maatkit version 1297 released

Maatkit (formerly MySQL Toolkit) version 1297 contains a significant update to MySQL Table Checksum (which will be renamed soon to avoid trademark violations). The changelog follows. What you don't see in the changelog is the unit test suite! I got a lot more of the code into modules that are tested and re-usable.

2007-11-18: version 1.1.19 

* Check for needed privileges on --replicate table before beginning. 
* Made some error messages more informative. 
* Fixed child process exit status with 8-bit right-shift. 
* Improved checksumming code auto-detects best algorithm and function. 
* Added --ignoreengine option; ignores federated and merge by default. 
* Added --columns and --checksum options. 
* Removed --chunkcol, --chunksize-exact, --index options. 
* --chunksize can be specified as a data size now. 
* Improved chunking algorithm handles more cases and uses fewer chunks. 
* Do not print --replcheck results for servers that are not slaves. 
* Create only one DB connection for each host, not one per host/tbl/chunk. 
* Code assumed backtick quoting, broke on SQL_MODE=ANSI (bug #1813030). 
* There were many potential bugs with database and table name quoting. 
* Child exit status errors could be masked by subsequent successes.

MySQL: Planet MySQL

New Maatkit release policy

Download Maatkit

Maatkit (formerly MySQL Toolkit) has for some time been released both as a bundle, and as individual tools. It's too much work to maintain the individual packages, and I don't think it really benefits anyone much, if at all. While the tools will still be versioned separately, I'm going to discontinue releases of the individual packages, and just release the one uber-package from now on.

This will also make it easier for me to manage the name change, but that's just an extra incentive; I've been considering this for a while.

By the way, Sourceforge indicated it would take up to a couple of days to finish the project's rename, but it took only a few minutes. Lots of broken links; I've asked for a permanent redirect from the old URLs to the new.

MySQL: Planet MySQL

MySQL Toolkit is now Maatkit

I am so lucky I married an archaeologist.

Choosing a new name for MySQL Toolkit has been a hassle. I wanted to avoid a literal name, such as, um, MySQL Toolkit. Short is good. And so on, and so on. All the while, the Phoenix/Firebird/Firefox naming debacle was in my thoughts. I only want to do this once.

Read on for a fun lesson in Egyptian mythology, courtesy of my wife!

MySQL: Planet MySQL

MySQL Toolkit version 1254 released

This release fixes several bugs introduced in the last release as I replaced untested code with tested code -- how ironic! Actually, I knew that was virtually guaranteed to happen. Anyway, all the bugs you've helped me find are now fixed. I also fixed a long-standing bug in MySQL Table Sync, which I am otherwise trying to touch as little as possible for the time being. (Remember to contribute to the bounty, and get your employer to contribute as well, so I can do some real work on it in the next month or so!)

The other big news is that the parallel dump and restore tools are now 1.0.0 because I consider them feature-complete. I have put the most work into tab-separated dumps. These two tools can do something MySQL AB's tools can't currently do: restore data before creating triggers (when doing tab-delimited dumps). That's an obvious requirement for loading data when tables have triggers. If you create the triggers before loading the data, you're practically guaranteed to end up with different data than was dumped. The tools now dump and reload both triggers and views. As long as you're dumping the mysql database, I think they should be able to completely duplicate a server (my initial goal was just data, not routines/triggers/views/etc).

Honestly, I hope MySQL's tools make this pair of tools obsolete in the future, but until then, they're a good way to dump and reload data at higher speeds. Keith Murphy did some measurements on parallel dump and restore speeds.

Read on for the full changelog.

MySQL: Planet MySQL

Page 1 | Next >>