This post is SEO bait for people trying to scale MySQL’s write capacity by writing to both servers in master-master replication. The short answer: you can’t do it. It’s impossible.
I keep hearing this line of reasoning: “if I make a MySQL replication ‘cluster’ and move half the writes to machine A and half of them to machine B, I can increase my overall write capacity.” It’s a fallacy. All writes are repeated on both machines: the writes you do on machine A are repeated via replication on machine B, and vice versa. You don’t shield either machine from any of the load.
In addition, doing this introduces a very dangerous side effect: in case of a problem, neither machine has the authoritative data. Neither machine’s data can be trusted, but neither machine’s data can be discarded either. This is a very difficult situation to recover from. Save yourself grief, work, and money. Never write to both masters.
high availability, mysql, replication, scalingThere is much discussion of why MySQL is more widely adopted than PostgreSQL. The discussion I’ve heard is mostly among the PostgreSQL community members, who believe their favorite database server is better in many ways, and are sometimes puzzled why people would choose an inferior product.
There are also many comparison charts that show one server is better than the other in some ways. These don’t really seem to help people with this question, either!
I can’t answer for everyone, but I can put it in the form of a question: if I were to replace MySQL with PostgreSQL, what things do I rely on that would become painful or even force a totally different strategy? The answer turns out to be fairly simple for me: replication and upgrades.
Love it or hate it, MySQL’s built-in replication is absolutely key to much of what I do with MySQL. I can truthfully say that it has lots of problems and limitations. But I can also say this about it:
Regardless of the technical strengths and weaknesses of each database’s replication systems, it is my perception that MySQL’s ultimately lets me do incredibly flexible and useful things; in general it is Just Enough and has just the right combinations of qualities for lots of purposes. And each of its weaknesses is easily avoided or worked around, or just sidestepped — because MySQL replication’s simplicity and flexibility lets me easily choose a different approach.
MySQL’s files are extremely portable between versions, between operating systems, and even between platforms most of the time (unless you have a system that doesn’t use IEEE floating-point format, but who does these days?). That means an upgrade is dead simple.
This may not seem like a big deal, but I work with a lot of data. When you do that, you have to consider the alternatives: what if I couldn’t upgrade in-place?
That’s the current state of PostgreSQL. You have to dump and reload your data, and when you have a terabyte of data, that’s no fun. The workarounds usually involve replicating your data to another server, switching to the other server, upgrading, and switching back. But why should you have to have another server just to upgrade your data?
I see this as a significant — even critical — sticking point. It’s something I just don’t have to think about most of the time with MySQL
Not for the systems I work on. These two problems seem extremely difficult for me to work around. I rely so heavily on MySQL’s replication and in-place upgrades that it feels too daunting to live without them.
What I’m trying to do here is give some psychological insight into what makes me feel happy with MySQL, and afraid of the thought of having to solve these problems with PostgreSQL. It may or may not apply broadly; my sense is that these are concerns for others as well, but I could be wrong.
If I were primarily a PostgreSQL user, I’m sure there would be similar feelings the other direction. This would explain why some people in the PostgreSQL camp seem to recoil away from MySQL. I’d be interested to hear why that is, too.
mysql, replication, upgradesMaatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.
This release contains major bug fixes and new features. Some of the changes are not backwards-compatible. It also contains new tools to help you discover replication slaves and move them around the replication hierarchy.
Changelog for mk-archiver: 2008-03-16: version 1.0.8 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * Changed short form of --analyze to -Z to avoid conflict with --charset. Changelog for mk-deadlock-logger: 2008-03-16: version 1.0.9 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). Changelog for mk-duplicate-key-checker: 2008-03-16: version 1.1.5 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-find: 2008-03-16: version 0.9.10 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-heartbeat: 2008-03-16: version 1.0.8 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-parallel-dump: 2008-03-16: version 1.0.7 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * A global database connection was re-used by children, causing a hang. Changelog for mk-parallel-restore: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Changed --charset to be compatible with other tools (bug #1877548). Changelog for mk-query-profiler: 2008-03-16: version 1.1.9 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-show-grants: 2008-03-16: version 1.0.9 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-slave-delay: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). Changelog for mk-slave-find: 2008-03-16: version 1.0.0 * Initial release. Changelog for mk-slave-move: 2008-03-16: version 0.9.0 * Initial release. Changelog for mk-slave-prefetch: 2008-03-16: version 1.0.1 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). Changelog for mk-slave-restart: 2008-03-16: version 1.0.6 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548). * Added logic to repair tables, and rewrote a lot of code. * Added --always option, disabled by default. Not backwards compatible. * --daemonize did not work. * --quiet caused an undefined variable error. Changelog for mk-table-checksum: 2008-03-16: version 1.1.26 * Added --setvars option (bug #1904689, bug #1911371). * Added 'A' part to DSNs (bug #1877548). * Added --unique option to mk-checksum-filter. * The exit status from mk-checksum-filter was always 0. * mk-table-checksum now prefers to discover slaves via SHOW PROCESSLIST. Changelog for mk-table-sync: 2008-03-16: version 1.0.6 * --chunksize was not being converted to rowcount (bug #1902341). * Added --setvars option (bug #1904689, bug #1911371). * Deprecated the --utf8 option in favor of the A part in DSNs. * Mixed-case identifiers caused case-sensitivity issues (bug #1910276). * Prefer SHOW PROCESSLIST when looking for slaves of a server. Changelog for mk-visual-explain: 2008-03-16: version 1.0.7 * Added --setvars option (bug #1904689, bug #1911371). * Added --charset option (bug #1877548).
Suppose you have a master-master replication setup, and you know one of the tables has the wrong data. How do you re-sync it with the other server?
Warning: don't just use any tool for this job! You may destroy your good copy of the data.
This release contains minor bug fixes and new features. Besides the little bug fixes, there's a fun new feature in mk-heartbeat: it can auto-discover slaves recursively, and show the replication delay on all of them, to wit:
baron@keywest ~ $ mk-heartbeat --check --host master -D rkdb --recurse 10 master 0 slave1 1 slave2 1 slave3 4
(Not actual results. Your mileage may vary. Closed course, professional driver. Do not attempt).
Nothing else in this release is very exciting. I just wanted to get the bug fixes out there.
This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield's relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I'm releasing it as part of Maatkit.
I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That's what happens when I write quickly! This post explains what the heck I was talking about.
Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave's relay logs. I wrote an implementation of this, tried it on my workload, and it didn't speed them up. (I didn't expect it to; I don't have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn't going to benefit from the pre-fetching.
In the meantime, I've got a pretty sophisticated implementation of Paul's idea just sitting around, unused. I haven't released it for the same reasons Paul didn't release his: I'm afraid it might do more harm than good.
However, if you'd like the code, send me an email at [baron at this domain] and I'll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.
This release contains bug fixes and new features. Click through to the full article for the details. I'll also write more about the changes in a separate article.
... I didn't get two-way sync done, and I didn't get the Nibble algorithm done. That much I expected. But I also didn't get the current work released tonight because I'm paranoid about breaking things. I'm trying to go through all the tools and write at least a basic test for them to be sure they can do the simplest "unit of work" (such as mk-find running and printing out that it finds the mysql.columns_priv table).
It's good that I'm doing this. I found that mk-heartbeat suddenly doesn't work on my Ubuntu 7.10 laptop. It goes into infinite sleep. Can anyone repro this and/or diagnose? The same code works fine on my Gentoo servers at work.
Hopefully I'll be able to release something very soon. Release early/often is fine, but "knowingly release brokenness" isn't in my code of conduct :)
I created MySQL Table Checksum because I was certain replication slaves were slowly drifting out of sync with their masters, and there was no way to prove it. Once I could prove it, I was able to show that replication gets out of sync for lots of people, lots of times. (If you really want to hear war stories, you should probably talk to one of the MySQL support staff or consulting team members; I'm sure they see this a lot more than I do).
I finally figured out what was causing one of my most persistent and annoying out-of-sync scenarios. It turns out to be nothing earth-shaking; it's just an easy-to-overlook limitation of statement-based replication. You could call it a bug, but as far as I can see, there's no way to fix it with statement-based replication. (I'd love to be proven wrong). Read on for the details.
Thanks to a patch Dane Miller submitted, MySQL Heartbeat can now be used to monitor Slony-I replication lag for PostgreSQL. It works identically: it updates a record on the master and looks for it on the slave. You just have to specify the database driver on the command-line:
mysql-heartbeat [options] --dbidriver=Pg
Most of the tools in the MySQL Toolkit are very MySQL-specific. This one just happened to be an exception, and I'm happy it's useful for more than the original purpose.
As I wrote a few days ago, I'm writing the replication chapter for the second edition of High Performance MySQL. I'm writing about replication filtering rules right now, and I thought it would be good to get input on this. If you have favorite replication filtering tricks you'd like to share, or tasks that always frustrate and/or confuse you, please post them in the comments. I'm making a section that shows how to accomplish common filtering and rewriting needs, such as preventing GRANT statements from replicating to the slaves.
Thanks very much! I hope the community involvement will make this book more useful for everyone.
Continuing in the tradition, which I hope has been as helpful to you as it has been to me, I'm opening the floor for suggestions on chapter 9 of the upcoming High Performance MySQL, Second Edition. Unlike the other chapters for which I've listed outlines, this one isn't substantially written yet. It's in detailed outline form at this point (a tactic that has worked very well for us so far -- I'll write about that someday).
I'm trying to get feedback much earlier in this chapter's lifecycle, for several reasons. Two of the most important are that this is one of the first chapters I've had a chance to really take from scratch, and the chapters I haven't written from scratch have been harder to organize, as you've probably seen from the last few outlines I posted. There's a lot of value in working top-down on this deep encyclopedia-style material.
Read on for the outline and more thoughts I just can't keep to myself.
SQL
high
replication
scaling
availability
balancing
failover
I am considering taking some time off work to concentrate deeply on MySQL Table Sync, which has been getting usage in very large companies whose names we all know. There are a lot of bugs and feature requests outstanding for it. It is overly complex, needs a lot of work, and I can't do it in one-hour or even three-hour chunks. I need to focus on it. I'm considering asking for a bounty of $2500 USD for this. Please let me know what you think of this; it seems to be a successful way to sponsor development on some other projects, like Vim.
For the amount of time I think this will take, $2500 is far below my per-hour consulting rate; I considered setting the bounty higher, but I think this will be a fair amount.
I would not begin this project before December at the earliest, so there's some time to raise funds and time for me to continue working on High Performance MySQL. I would like a volunteer to coordinate the fund-raising for me. It should be trivial, but I don't want to do it myself, for several reasons. I can publicize the bounty on this blog and the project mailing list, and contact some of the corporations that have asked me for features. I doubt it will be hard to raise the money.
I'm not committing to this, just proposing it, though I did run it by my employer, who is very supportive. Here's the list of features I propose to implement:
Alternatively, if someone wants to do it and just contribute the code to the project, I'd be delighted. I doubt that will happen, though, and there'd still be a lot of work in it for me, so I think it's probably more realistic that I will do it.
I wrote a couple weeks ago about my work on the Backup and Recovery chapter for High Performance MySQL, 2nd Edition. Thanks for your comments and suggestions, and thanks to those of you who helped me over email as well.
I've had several questions about what is included in the chapter, so I thought I'd post the outline as it stands now.
This release of MySQL Toolkit adds a new tool, fixes some minor bugs, and adds new functionality to several of the tools.
Is MySQL giving you the error message "ERROR 1200 (HY000): The server is not configured as slave; fix in config file or with CHANGE MASTER TO" when you try to run START SLAVE? There are a few simple troubleshooting steps to take, but I always forget what to do. This article is to help me remember in the future!
SQL
MySQL
configuration
master
replication
troubleshooting
slave
I've just released changes to two of the tools in MySQL Toolkit. MySQL Table Checksum got some convenient functionality to help you recursively check slaves for bad replicated checksum chunks. MySQL Archiver got statistics-gathering functionality to help you optimize your archiving and purging jobs, plus a few important bug fixes.
This release fixes some bugs and improves the chunking functionality. MySQL Table Checksum had a few minor bugs and one major bug with the chunking functionality. I also rewrote the chunking, though the behavior is backwards compatible. I am very happy with the way it works now, and will probably not make any more incompatible changes to it. The changes enabled me to add support for chunking on float, double and decimal columns.
It still doesn't support chunking on character-based columns, though I know now how I'll do it if I do. Also, support for ENUM and SET shouldn't be hard to add. I have no need for these features myself. If you need it, please file a bug report on the Sourceforge tracker.
This release fixes some bugs and improves the chunking functionality. MySQL Table Checksum had a few minor bugs and one major bug with the chunking functionality. I also rewrote the chunking, though the behavior is backwards compatible. I am very happy with the way it works now, and will probably not make any more incompatible changes to it. The changes enabled me to add support for chunking on float, double and decimal columns.
It still doesn't support chunking on character-based columns, though I know now how I'll do it if I do. Also, support for ENUM and SET shouldn't be hard to add. I have no need for these features myself. If you need it, please file a bug report on the Sourceforge tracker.
MySQL Toolkit distribution 620 updates documentation and test suites, includes some major bug fixes and functionality changes, and adds one new tool to the toolkit. This article is mostly a changelog, with some added notes.
Many of the tools have matured and I just needed to make the documentation top-notch, but there's still a lot to be done on the crucial checksumming and syncing tools. Time is in short supply for me right now, though. In fact, I actually finished this release on June 22, but wasn't able to release it till just tonight!