» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with MySQL + perl

Percona wants to hire a Maatkit developer

Percona is looking to hire someone to develop Maatkit, among other things.

If I weren’t having so much fun being the consulting team lead, I’d be doing it myself. (In fact, I’m still hacking on it a lot. Got some pretty fun stuff done this weekend.) I don’t know what the rest of the world thinks, but I think Maatkit is a damn enjoyable project to work on. Hopefully someone else will have the same kind of mindset and want to get paid for it, unlike poor working-on-the-weekends me.

I’m not stepping away from the project. It’s just grown a lot, and there is room and money to grow it much more. This is actually the best compliment to the project: that it is worth hiring someone to keep improving it. Lots of people are using it, and there’s a lot of stealth-mode stuff I/we want to do with it too.

On a related note, who wants me to order another batch of Maatkit t-shirts? I’ve gotten quite a few questions about it.

No Tags

MySQL: Planet MySQL

What it?s like to write a technical book, continued

My post on what it’s like to write a technical book was a stream-of-consciousness look at the process of writing High Performance MySQL, Second Edition. I got a lot of responses from it and learned some neat things I wouldn’t have learned if I hadn’t written the post. I also got a lot of questions, and my editor wrote a response too. I want to follow up on these things.

Was I fair, balanced and honest?

I really intended to write the post as just “here’s what it’s like, just so you’re prepared.” But at some point I got really deep into it and lost my context. That’s when I started to write about the things that didn’t go so smoothly with the publisher, and some of these things had a little extra sting in them that I would have done well to edit out.

All of us are human and the process wasn’t that bad, all things considered — the book was just a massive project that put huge demands on all of us and stressed everything from the capabilities of our chosen tools to our patience. As the editor points out in his response to my blog post, this is precisely why nobody else has ever been able to pull this off. This book stands head and shoulders above the crowd. It’s just hard to write, and very few people in the world actually have the knowledge to do it, much less the time, inclination, and ability.

Everything I said was (I believe) factual and correct, although as the editor points out there are different stories behind them. I also want to mention that I’d shared all those concerns with my editor; I avoid criticizing people behind their backs. In hindsight, throwing all of my concerns onto a blog post without warning isn’t the kind of thing I like to do either.

So I believe I was honest, but unfair to the editor. I’ve apologized to him. And by the way, yes I would work with him again, and I fully expect that it would be easier because I have learned more about the process.

I ran this post by my editor before publishing it.

A deeper explanation of my heuristics

Several people asked me to say more about my heuristics for improving the quality of the writing. I’ve already explained many of them, but here’s more:

(were|was|is|are|has been|be)( [a-zA-Z]+)? [a-zA-Z]+ed\>
This regular expression can help find some occurrences of passive voice. It finds a word or phrase that’s some variation on the verb “to be,” usually in the past tense; followed by an optional word (probably an adjective); followed by another word that ends in “-ed,” which is also potentially a verb in the past tense. This is not the only way to write in the passive voice, but it’s kind of the classic. Here are some examples: “the blog post was posted,” “the benchmark was rapidly created.”
(were|was|is|are|ha[sd] been|be)( [a-zA-Z]+)? [a-zA-Z]+e[dn]\>
An enhanced version. As I looked at the preceding point, I saw some other simple examples it doesn’t catch. For example, it doesn’t catch “had been” and it doesn’t catch verbs like “written.” Ironically, the first thing that came to mind as I thought about examples was “the book had been written.”
while|since
There’s nothing wrong with these words, except when they’re used in lieu of “because” to indicate causality. This is a problem for non-native English speakers, because these words have a temporal meaning too. For example, “Since MySQL 4.1 has no stored procedures, you have to use MySQL 5.0 if you want stored procedures.” If you aren’t a native English speaker, and even if you are, it’s easy to read that as “MySQL has had no stored procedures since version 4.1, …” and then when your eyes reach the part about MySQL 5.0, it makes no sense. My rule for this is to say “because” when I mean “because.”
using
Real examples: “Using MyISAM tables works very well” can become “MyISAM tables work very well.” And “A final possibility is simply to switch to using a table” can become “Finally, you can use a table” instead.
in order
The phrase “in order to” can almost always be replaced by “to.” It also tends to show a rough transition between the first and second phrases in a sentence. Perhaps these phrases should be integrated into a single phrase. “You can use this regex in order to find poorly constructed sentences” can become “this regex can find poorly constructed sentences” or “You can find poorly constructed sentences with this regex.” I prefer the latter; it is very direct, and that straightforward, simple writing style is really important in complex subject matter.
of course|without saying|obviously|clearly|needless
It goes without saying, but of course these words obviously point out when I’m writing stupid things that I clearly need to take a closer look at. Needless to say, most of the phrases in this paragraph are indeed needless to say. They are a red flag for lazy writing, such as glossing over a difficult point that should instead be explained — hard work, but necessary.
whether
I found quite a few places where the phrase “whether or not” was used. This can be shortened: “to see whether or not the disk is the problem” can become “to see whether the disk is the problem.” But better yet, the phrase often glues together poorly written phrases into an awkward sentence, just as “in order to” does. Can “whether” be replaced by “if?” Or does the sentence or paragraph just need to be reworked completely?
allow
This word can usually be replaced by “let.” “The remaining settings allow MySQL to allocate more RAM” can become “The remaining settings let MySQL allocate more RAM.” Occasionally, it is part of a larger phrase or thought needs to be shortened and clarified. “When nobody is writing, readers obtain read locks that allow other readers to do the same” became “When nobody is writing, readers can obtain read locks, which don’t conflict with other read locks.”

ensure
I found that this word is often subtly misused. It really means “guarantee” but is often used as “double-check” or “make sure.” I don’t want to be too dogmatic about this word: its usage in modern English is complex (see the usage note on assure here; that in itself might be a reason to avoid it). But I found many places where I wanted to remove it in favor of an explicit instruction that tells the reader to take action. “Ensure” as an instruction is kind of a politically correct way to tell someone to do something, and I’m not afraid to just tell you to do it if I think you need to. I don’t want you to miss my meaning.
only
I have a habit of using this word incorrectly. “I only have ten fingers” should be “I have only ten fingers.”
as (we|you)|again,
These phrases usually show a place where the writing is confused and redundant. They show up in places like “as we already said, you should tune your server” and “again, you should tune your server.” Any instruction to the reader to break the narrative flow is a place to examine whether the concepts are in the right order. Cross-references, footnotes, and reminders are not always evil, but they’re to be regarded with suspicion.

Readability metrics

The tools I used to find sentences and phrases that score badly on some readability metric were pretty helpful to me as I tightened the writing up more and more. Nobody has reviewed the book yet, but I think when they do, they’ll be unlikely to mention “oh, and by the way the writing is wonderfully compact!” If we pulled this off right, you won’t notice that the writing is clear and compact. Writing is like a stereo system: you’re supposed to hear the music, not the speakers.

Anyway, my point is that we expanded the first edition’s actual coverage many times over, and ended up with only 658 pages of actual material. So the writing is much more compressed, and to do that you have to find and eliminate confusing writing. Confusing writing usually means that the concepts don’t flow clearly, and it takes more words to say the same thing because you’re kind of bumbling about, gesturing at your meaning from several angles instead of saying it clearly just once.

Here’s how I analyzed each chapter:

  • I used OpenOffice’s export feature to export the file to MediaWiki format. This is a plain-text markup format. I forget now why I didn’t just export to text, but there was something about MediaWiki format that made it easier to munge with Perl.
  • I ran my clean_text.pl program against the exported file to convert the format to a simpler one without special characters and markup. Some of the markup (footnotes, for example) stayed in the text and confused the metrics, but that’s life.
  • I ran my analyze_text.pl program against this to find the “worst” places.

As I wrote in my previous post, the analyzer uses a combination of readability metrics and “other stuff” to measure the badness of each sentence and paragraph. It aggregates sentences and paragraphs by the metrics. I calculated the number of words, percent of complex words, syllables per word, number of sentences, words per sentence, and a bunch of other things, as well as the standard readability metrics. Each sentence and paragraph got scored on these. Then I printed overall metrics, and sorted the sentences and paragraphs worst-first and printed out a snippet of the offending text. Here’s a sample of chapter 3’s metrics (originally numbered chapter 4) at some intermediate stage in the writing process.

This was a lot of work. If I had been writing with Vim, I could have done better. I could have used the compiler integration and set my “make” program to the analysis program. If you use Vim and you don’t know about this, it’s a pity. My next book will be written in Vim, by the way.

Actually, I probably could have done better regardless, but this was good enough. I just searched for the snippets and then examined what was going on.

There were some false positives. For example, bullet-points often scored badly on the readability metrics, and so a five-word bullet point item would look like terrible writing just because it was short enough that it had a high percentage of complex words. It’s not an exact science. Maybe next time will be better.

If you’d like to see the source code, here’s the clean_text.pl and here’s the analyze_text.pl. Enjoy!

,

MySQL: Planet MySQL

apache friends - very easy apache, mysql, php and perl installation without hassles

Many people know from their own experience that it's not easy to install an Apache web server and it gets harder if you want to add MySQL, PHP and Perl. XAMPP is an easy to install Apache distribution containing MySQL, PHP and Perl. XAMPP is really very e

opensource: del.icio.us tag/opensource

Using XML with MySQL

A growing number of applications today use data represented in the form of XML documents. The following list indicates just some of the possibilities open to you for employing<sep/>

XML: del.icio.us/tag/xml

Subversion: What to do when your repository server moves to another ip?

This weekend our networking guys decided to change ips for all of our servers. They also changed our subversion server’s ip. This caused some issues in the subversion world with developers who had checkouts pointing to ips instead of hostname, using command similar to: svn co svn+ssh://192.168.1.10/svn/myrepos/ /home/mycheckout/ Now when they do “svn update” inside [...]

MySQL: Planet MySQL

Get Maatkit fast from the command line

I have been using Maatkit in a different way since I joined Percona as a consultant. When I’m working on a system now, it’s a new, unfamiliar system — not one where I have already installed my favorite programs. And that means I want to grab my favorite productivity tools fast.

I intentionally wrote the Maatkit tools so they don’t need to be “installed.” You just run them, that’s all. But I never made them easy to download.

I fixed that. Now, at the command line, you can just run this:

wget http://www.maatkit.org/get/mk-table-sync

Now it’s ready to run. Behind the scenes are some Apache mod_rewrite rules, a Perl script or two, and Subversion. When you do this, you’re getting the latest code from Subversion’s trunk.[1][2] (I like to run on the bleeding edge. Releases are for people who want to install stuff.)

Because there’s some Perl magic behind it, I made it even easier — it does pattern-matching on partial names and Does The Right Thing:

baron@kanga:~$ wget http://www.maatkit.org/get/sync
--21:38:50--  http://www.maatkit.org/get/sync
           => `sync'
Resolving www.maatkit.org... 64.130.10.15
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 302 Moved
Location: http://www.maatkit.org/get/mk-table-sync [following]
--21:38:50--  http://www.maatkit.org/get/mk-table-sync
           => `mk-table-sync'
Connecting to www.maatkit.org|64.130.10.15|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-perl]

    [      <=>                            ] 163,259      136.51K/s             

21:38:51 (136.13 KB/s) - `mk-table-sync' saved [163259]

The redirection is there because otherwise wget will save the file under the name ’sync’ instead of ‘mk-table-sync’.

And if you’ve forgotten which tools exist, you can just click on over to http://www.maatkit.org/get/ and see.

A quick poll: instead of getting the latest trunk, should this give you the code from the last release? I can do that, if you want.

[1] OK, it’s only refreshed every hour. So you’re getting code that’s up to an hour old.

[2] update: now /get/foo gets the latest release, and /trunk/foo gets the latest trunk code.

, ,

MySQL: Planet MySQL

Maatkit in RHEL and CentOS

Update: Karanbir says “Just one thing to keep in mind is that we dont want too many people using it from the Testing repository - we only need enough feedback to move it from testing to stable ( and to be honest, there are already 8 people who have said yes it works - so move to stable should happen within the next 24 - 48 hrs ). Once the package is in stable, users on CentOS4 and 5 wont need to do anything more than just ‘yum install maatkit’ and it will install for them.”

At least one person (Karanbir Singh) is working to get Maatkit into the CentOS repositories, and I believe there might be movement towards RHEL also. From an email to the Maatkit discussion list a little while ago,

I am in the process of getting maatkit into the CentOS-Extras repositories. The first step for that is that every package needs to go into a CentOS-Testing repo and feedback is required from the project and users on its stability / usability and packaging quality.

maatkit-1887 is now available in the CentOS-Testing[*] repo’s and as soon as we can get some feedback ( needs to be 5 different people, none of whom can be CentOS Developers ) - the packages will move into the main repository so that all users can get access.

I’d appreciate it if people on this were able to give those packages a go and let me know if there are any issues. You can leave feedback :

  • via the maatkit-discuss mailing list (http://sourceforge.net/mailarchive/forum.php?forum_name=maatkit-discuss)
  • on the centos-devel list ( http://lists.centos.org ) or
  • http://bugs.centos.org/ against category ‘maatkit’

[*] : Info about the Testing repo and howto set it up on your machine : http://wiki.centos.org/Repositories

If you’re interested in getting Maatkit into these repositories, please take a moment and give the requested feedback. I can’t do it because it would be a conflict of interest for the main developer to assert that the code is stable and usable.

, ,

MySQL: Planet MySQL

Centralised Notification (Aka Informeer)

It’s been a while since I had chance to work on Informeer as my itch was one of multi-user web based password management (AuthStor). Oh and moving house. )

Now that things are settling down again (Servers back up and running) I decided to take a break from AuthStor and focus on something new - Informeer.

The concept is simple, Centralised Notification.

I am forever configuring notifications from several sources, be it backup alerts, host monitoring notification and even simple applications that send mail via SMTP. When living in a world of change, both software and business, having to visit every application to change an e-mail address or add a new user to a notification schedule can be quite time consuming. Add to that the effort of having to modify firewalls, SMTP servers and XMPP settings etc etc. The idea of a single web interface where all your notifications can be configured is quite appealing. Informeer aims to make that a reality (one day).

While I take my time with the implementation I thought I would post a basic intro to RPC-XML using Perl. The RPC-XML Perl module makes writing your own client/server application a piece of cake, and while not quite point and click, it will give you enough flexibility to centralise your own notifications (or anything else for that matter). It’s not rocket science, but it works!

If you have been looking for a flexible quick and easy fix to the centralised notification problem, or even if you are just looking to push some data from a firewalled site to one of your servers This Tutorial should help get you started.

In between moving house and playing with notifications I found time to upgrade to Wordpress 2.5 and MoinMoin 1.6, and wow what an improvement in both camps.
The Wordpress dashboard is the most visible improvement - it’s amazingly clean! If you have been waiting for 2.5 to settle down or just cautious about upgrading, I would say take the plunge, you won’t regret it. As for MoinMoin 1.6, there really is no better Wiki engine out there (my opinion) and it just keeps getting better with every release!.

That just leaves me to post a quick MySQL tip that may well be obvious, but happens to catch me out from time to time. twisted

MySQL Tip of the month

If you ever get the urge to convert a 1Gig MyISAM table (with 34 million rows) to InnoDB within MySQL, try to avoid the simple ALTER TABLE tablename ENGINE=INNODB; method.
I made the mistake of running that command on a fairly decent server with bags of space and memory only to find the command running over 5 hours later with an InnoDB tablespace at around 4Gig (ok a bit of tuning might have helped) ;)

Your best bet is to dump the table using mysqldump, drop the existing table and re-import the table after modifying the definition to be InnoDB. That worked for me in less than 10 minutes. Alternatively you can create a new InnoDB table and insert directly from the existing table as per MySQL documentation e.g INSERT INTO newtable SELECT * FROM oldtable.

I’m sure I am not the first to make this mistake, and may not be the last…..

MySQL: Planet MySQL

Improved Cacti monitoring templates for MySQL

Download MySQL Cacti templates

As promised, I’ve created some improved software for monitoring MySQL via Cacti. I began using the de facto MySQL Cacti templates a while ago, but found some things I needed to improve about them. As time passed, I rewrote everything from scratch. The resulting templates are much improved.

You can grab the templates by browsing the source repository on the project’s homepage.

In no particular order, here are some things I improved:

  • Standard polling interval and graph size by default.
  • Full captions on every graph; you don’t have to guess at how big the values are. Each graph has current, max, and average values printed at the bottom for every value on it.
  • Much more data is captured. I’ve graphed almost everything I could think of.
  • The graphs are grouped better. Most graphs have only related values. There are some exceptions, but not many.
  • The templates don’t hijack your existing installation. They don’t depend on or alter anything in your default Cacti installation.
  • The script that gathers the data is totally rewritten from scratch, and much improved. For example, the math works on 32-bit systems. It has caching built-in so each poll cycle results in just one request to the server, instead of one request per graph. (This is a weakness of Cacti I’m trying to work around). It also has debugging aids and other good coding stuff.
  • By default, it assumes you have the same username and password across every server you’re monitoring, so you don’t have to fill in a username and password for every single graph you create.
  • One data template == one graph template. This helps work around another Cacti limitation.
  • Lots more. Honestly I can’t really remember everything I’ve done. I’m sure you’ll help me remember by asking me how to get X feature working the way you want, and I’ll go “oh, yeah, that’s another thing I improved…”

Cacti templates are very laborious to create if they’re complex at all; it takes a long time and is very error-prone. Instead of doing it through Cacti’s web interface and exporting a huge XML file, I eliminated the redundancies and created a small, easy-to-maintain file from which I generate the XML template with a Perl script. This gives the added benefit of letting me (or you) generate templates with different parameters such as polling interval or graph size. The README file has the full details. However, I’ve pre-generated a set of templates that matches Cacti’s defaults, so you can probably just use that.

This has taken a lot of time. In particular, I spent a lot of time working on it at my former employer, The Rimm-Kaufman Group (kudos to them for letting me open-source the work) and I just spent most of my weekend writing the scripts to convert from the compact format to XML templates, so it’s possible to maintain these beasts. Plus I had to develop the compact format, too. This took a lot of time because I had to understand the Cacti data model, which is pretty complex.

Please enter issue reports for bugs, feature requests, etc at the Google project homepage, not in the comments of this blog post. I do not look through comments on my blog when I’m trying to remember what I should be working on for a software project.

If these templates help you and you feel like visiting my Amazon.com wishlist and sending something my way, I’d appreciate it!

PS: You may also be interested in Alexey Kovyrin’s list of templates for monitoring servers.

, , , , , ,

MySQL: Planet MySQL

mylvmbackup 0.8 has been released

I am happy to announce the release of mylvmbackup version 0.8. mylvmbackup is a tool for quickly creating backups of a MySQL server's data files. To perform a backup, mylvmbackup obtains a read lock on all tables and flushes all server caches to disk, makes an LVM snapshot of the volume containing the MySQL data directory, and unlocks the tables again. The snapshot process takes only a small amount of time. When it is done, the server can continue normal operations, while the actual file backup proceeds.

Below is the list of changes since version 0.6. You may wonder what happened to version 0.7 - it had a rather short life cycle as I was informed about a bug that I fixed quickly before I made a wider release announcement of 0.7.

  • Fixed a bug in the InnoDB recovery function: the second mysqld process clobbered the socket file of the primary MySQL instance (thanks to Alain Hoang for reporting this)
  • Updated the man page, noted some other limitations of the InnoDB recovery function
  • Bug fix: use the correct mysqld parameter to provide an alternative PID file (--pid-file instead of --pidfile) - thanks to Guillaume Boddaert and Jim Wilson for reporting this!
  • Added option "--skip_mycnf" to skip including a copy of the MySQL configuration file in the backup, added a safety check that the file actually exists prior to backing it up.

Updated package are available from the home page and via the openSUSE Build Service as usual. Updated packages for Debian/Ubuntu and Gentoo Linux should also be available shortly. Enjoy!

Speaking of LVM snapshot backups: I will be giving a talk about this subject at our MySQL Conference 2008 in Santa Clara, CA next week. If you are curious about how MySQL can be backed up using this technology, please consider to stop by!

 

MySQL: Planet MySQL

Maatkit version 1877 released

Download Maatkit

Maatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.

This release contains major bug fixes and new features. Some of the changes are not backwards-compatible. It also contains new tools to help you discover replication slaves and move them around the replication hierarchy.

Changelog for mk-archiver:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Changed short form of --analyze to -Z to avoid conflict with --charset.

Changelog for mk-deadlock-logger:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-duplicate-key-checker:

2008-03-16: version 1.1.5

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-find:

2008-03-16: version 0.9.10

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-heartbeat:

2008-03-16: version 1.0.8

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-parallel-dump:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * A global database connection was re-used by children, causing a hang.

Changelog for mk-parallel-restore:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Changed --charset to be compatible with other tools (bug #1877548).

Changelog for mk-query-profiler:

2008-03-16: version 1.1.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-show-grants:

2008-03-16: version 1.0.9

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-delay:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).

Changelog for mk-slave-find:

2008-03-16: version 1.0.0

   * Initial release.

Changelog for mk-slave-move:

2008-03-16: version 0.9.0

   * Initial release.

Changelog for mk-slave-prefetch:

2008-03-16: version 1.0.1

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

Changelog for mk-slave-restart:

2008-03-16: version 1.0.6

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).
   * Added logic to repair tables, and rewrote a lot of code.
   * Added --always option, disabled by default.  Not backwards compatible.
   * --daemonize did not work.
   * --quiet caused an undefined variable error.

Changelog for mk-table-checksum:

2008-03-16: version 1.1.26

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added 'A' part to DSNs (bug #1877548).
   * Added --unique option to mk-checksum-filter.
   * The exit status from mk-checksum-filter was always 0.
   * mk-table-checksum now prefers to discover slaves via SHOW PROCESSLIST.

Changelog for mk-table-sync:

2008-03-16: version 1.0.6

   * --chunksize was not being converted to rowcount (bug #1902341).
   * Added --setvars option (bug #1904689, bug #1911371).
   * Deprecated the --utf8 option in favor of the A part in DSNs.
   * Mixed-case identifiers caused case-sensitivity issues (bug #1910276).
   * Prefer SHOW PROCESSLIST when looking for slaves of a server.

Changelog for mk-visual-explain:

2008-03-16: version 1.0.7

   * Added --setvars option (bug #1904689, bug #1911371).
   * Added --charset option (bug #1877548).

MySQL: Planet MySQL

apache friends - xampp

Easy to install Apache distribution containing MySQL, PHP and Perl. XAMPP is really very easy to install and to use - just download, extract and start.

opensource: del.icio.us tag/opensource

Aggregate Functions with Perl stored procedures.

I found myself thinking this evening about how someone could set about writing aggregate functions using Perl and on an idle Google search, came across this webpage entitled User-defined Aggregate Functions in DB2 Universal Database.

I wondered if a similar technique could be applied for our implementation of External Language Stored Procedures for MySQL. It turns out that the answer is: Yes.

Suppose you had the following Perl declarations in a Perl module:

our %summary=();
sub aggregate_add($$)
{
my ($value,$group)= @_;
if (defined $value)
{
$summary{$group}= {value=>0.0, value2=>0.0, count=>0}
if !defined $summary{$group};
my $scalar= scalar $value;
$summary{$group}{value}+= $scalar;
$summary{$group}{value2}+= $scalar * $scalar;
$summary{$group}{count}++;
}
return $value;
}
sub aggregate_result($$)
{
my ($value,$group)= @_;
return undef if !defined $summary{$group};
my $count= $summary{$group}{count};
my $average= $summary{$group}{value} / $count;
my $sqavg= $summary{$group}{value2} / $count;
my $variance= $sqavg - ($average * $average);
my $stddev= sqrt $variance;
return sprintf("count=%d avg=%0.3f var=%0.3f std=%0.3f",
$count, $average, $variance, $stddev);
}


Now we need to test this little piece of Perl magic by starting up a client session...

mysql> CREATE TABLE t1 (grp INT, a DOUBLE);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (1,1), (2,2), (2,3), (3,4), (3,5), (3,6);
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0

mysql> CREATE FUNCTION test.agg_result(value DOUBLE, grp INT) RETURNS CHAR(128)
-> LANGUAGE Perl NO SQL EXTERNAL NAME 'Foo::aggregate_result';
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE FUNCTION test.agg_add(value DOUBLE, grp INT) RETURNS DOUBLE
-> LANGUAGE Perl NO SQL EXTERNAL NAME 'Foo::aggregate_add';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT COUNT(a),
-> CAST(AVG(a) AS DECIMAL(7,3)) 'AVG',
-> CAST(VARIANCE(a) AS DECIMAL(7,3)) 'VAR',
-> CAST(STD(a) AS DECIMAL(7,3)) 'STD',
-> test.agg_result(MAX(test.agg_add(a,grp)),grp) 'TEST'
-> FROM t1 GROUP BY grp;
+----------+-------+-------+-------+---------------------------------------+
| COUNT(a) | AVG | VAR | STD | TEST |
+----------+-------+-------+-------+---------------------------------------+
| 1 | 1.000 | 0.000 | 0.000 | count=1 avg=1.000 var=0.000 std=0.000 |
| 2 | 2.500 | 0.250 | 0.500 | count=2 avg=2.500 var=0.250 std=0.500 |
| 3 | 5.000 | 0.667 | 0.816 | count=3 avg=5.000 var=0.667 std=0.816 |
+----------+-------+-------+-------+---------------------------------------+
3 rows in set (0.04 sec)


Whoot! It actually works! Note that this only works because the Perl instances are not shared between multiple threads and so all the calls to those subroutines and their variables exist within the calling thread only. This avoids any nasty synchronization headaches. I would imagine that it would not be much more difficult to do something similar with the Java plugin except that the global statistics variable would have to be a ThreadLocal instance to avoid multiple threads conflicting with each other.

I guess I shall just have to include it as a test case; at least until a concrete method to declare aggregate stored functions arrives. I would imagine that such a declaration would name existing stored procedures in order to construct the aggregate function.

Something to think about for the future.

(An unrelated side note, March is Endometriosis Awareness Month. Please spend a thought for women who suffer from this disabilitating disease... Thanks)

MySQL: Planet MySQL

Maatkit version 1753 released

Download Maatkit

This release contains minor bug fixes and new features. Besides the little bug fixes, there's a fun new feature in mk-heartbeat: it can auto-discover slaves recursively, and show the replication delay on all of them, to wit:

baron@keywest ~ $ mk-heartbeat --check --host master -D rkdb --recurse 10
master 0
slave1 1
slave2 1
slave3 4

(Not actual results. Your mileage may vary. Closed course, professional driver. Do not attempt).

Nothing else in this release is very exciting. I just wanted to get the bug fixes out there.

MySQL: Planet MySQL

Maatkit version 1709 released

This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield's relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I'm releasing it as part of Maatkit.

MySQL: Planet MySQL

How pre-fetching relay logs speeds up MySQL replication slaves

I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That's what happens when I write quickly! This post explains what the heck I was talking about.

MySQL: Planet MySQL

Speed up your MySQL replication slaves

Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave's relay logs. I wrote an implementation of this, tried it on my workload, and it didn't speed them up. (I didn't expect it to; I don't have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn't going to benefit from the pre-fetching.

In the meantime, I've got a pretty sophisticated implementation of Paul's idea just sitting around, unused. I haven't released it for the same reasons Paul didn't release his: I'm afraid it might do more harm than good.

However, if you'd like the code, send me an email at [baron at this domain] and I'll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.

MySQL: Planet MySQL

The Uniform Server

The Uniform Server is a WAMP package that allows you to run a server on any MS Windows OS based computer. It is small&lt;sep/&gt;

WAMP: del.icio.us/tag/wamp

The Uniform Server

The Uniform Server is a WAMP package that allows you to run a server on any MS Windows OS based computer. It is small and mobile to download or move around and can also be used or setup as a production/live server. Developers also use The Uniform Server t

open-source: del.icio.us tag/open-source

What is new in Maatkit

My posts lately have been mostly progress reports and release notices. That's because we're in the home stretch on the book, and I don't have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly.

MySQL: Planet MySQL

Maatkit version 1674 released

This release contains bug fixes and new features. Click through to the full article for the details. I'll also write more about the changes in a separate article.

MySQL: Planet MySQL

Page 1 | Next >>