It felt like the right time for us to look back at some useful commands for table maintenance that some of us may not have mastered as much as we might like to think.
In my post about gathering index statistics, I referred to OPTIMIZE TABLE, ANALYZE TABLE, and REPAIR TABLE — but I never explained in depth what the different commands do, and what the differences between them are. That is what I thought I would do with this post, focusing on InnoDB and MyISAM, and the differences in how they treat those commands. I will also look at different cases and see which one is right for in each case.
I've been doing some benchmarking recently to satisfy the curiosity about 5.1's performance compared with 4.1. The major question this time revolves around how much additional performance an external RAID array can provide (for us it's typically beyond the 6 drives a Dell 2950 can hold).
In previous search benchmarks, I utilized random content generated with Drupal's devel module. In these latest benchmarks, I used an actual sanitized copy of the Drupal.org community website database, with email addresses and passwords removed. The first tests were intended to confirm that Xapian continues to perform well with large amounts of actual data. Additional tests were performed to measure the effect of various MySQL tunings and configurations. The following data was derived from several hundred benchmarks run on an Amazon AWS instance over the past week using the SearchBench module.
These tests confirm that Xapian continues to offer better search performance than Drupal's core search module. Contrary to popular belief, the data also shows that using the InnoDB storage engine for search tables significantly outperforms using the MyISAM storage engine for search tables, especially when your database server has sufficient RAM. The data also confirms that allocating additional RAM for MySQL's temporary tables can also improve search performance.
There's been a lot of rumors floating around internally at Yahoo that it's best to turn off some of your CPU cores when using Innodb, especially if you have a machine with > 4 cores. At this point there's no question in my mind that Innodb doesn't perform much better when you double your cores from 4 to 8, but I really wanted to know if 8 actually performed worse.
It’s time to continue our series on the transactional storage engines for MySQL. Some might question why I even include Falcon because it is very much beta at this time. MySQL, however, has made quite an investment into Falcon, and while it is currently beta, the code is improving and it looks like that it will be production-worthy when MySQL server 6.0 hits GA.
If this is the case, it is important to begin to understand what Falcon was designed for and how it differs from other transactional engines such as InnoDB. I am going to concentrate quite a bit on the Falcon/InnoDB comparison as that is what everyone wants to talk about. This is despite my having heard MySQL employees repeatedly make statements to the effect of, “Falcon is not going to replace InnoDB,” or “Falcon is not competing with InnoDB.” Well, take that with a grain of salt. It certainly seems to me that they are competing for the same spot.
Warning
As I said, Falcon is beta. First off, don’t even try to use it in production. Using it in production means you will also be using MySQL Server 6.0, which itself is considered alpha. Your data will explode, be corrupted, or eaten by jackals. It won’t be pretty. It will cause great pain.
In addition, the features of Falcon are still changing. What I say here might or might not be accurate in the future.
End of Warning
So, why was Falcon even created?
MySQL
replication
InnoDB
falcon
mvcc
row-based
statement-based
InnoDB is a storage engine that uses MVCC (described shortly) to provide ACID-compliant transactional data storage using row-level locking. MVCC stands for Multi-Version Concurrency Control. It is how InnoDB allows multiple transactions to look at a data set of one or more tables and have a consistent view of the data. MVCC keeps a virtual snapshot of the dataset for each transaction. An example will make this clear.
Let’s assume you have two transactions (and only two transactions) running on a system. If transaction A starts at 10:45:56 and ends at 10:45:89, it gets a consistent view of the dataset during the time that the transaction runs. If transaction B starts at 10:45:65, it would see exactly the same view of the dataset that transaction A saw when it began the transaction. If transaction B started at 10:45:95, it would see the modified dataset after transaction A made modifications. During the duration of each transaction, the dataset that each sees does not change, except for the modifications the transaction itself makes. Consider that a typical production database server is running hundreds of queries a second, and you realize that the job of MVCC/the InnoDB storage engine gets very complicated maintaining all these views of the data.
MySQL server storage engines use three different locking options: table-level locking, page-level locking, and row-level locking. With table-level locking, if a query accesses the table it will lock the entire table and not allow access to the table from other queries. The benefit of this is that it entirely eliminates deadlocking issues. The disadvantage is that, as mentioned, no other queries have access to the table while it is locked. If you had a table with 16,000,000 rows and needed to modify one row, the entire table is inaccessible by other queries. The MyISAM and memory storage engine use table-level locking.
Page-level locking is locking of a group of rows instead of a the entire table. The number of rows actually locked will vary based on a number of factors. Going back to our example of a 16,000,000-row table, lets assume a page-level lock is used. If a page consists of 1,000 rows (this would vary depending on the size of the rows and the actual amount of memory allocated to a page), a lock would lock only a thousand rows. Any of the other 15,999,000 rows could be used by other queries without interference. The BDB storage engine uses page-level locking.
Row-level locking, as the name suggests, acquires a lock on as small an amount as a single row from a table. This will block the minimal amount of table content and allows for the most concurrency on a table without problems. InnoDB and Falcon both use row-level locking.
While the InnoDB configuration options are not strictly related to the transactional characteristics (other than innodb_flush_log_at_trx_commit), I thought it would be useful to have them here for reference. These are the most common or most important configuration parameters:
MySQL
locking
InnoDB
mvcc
innodb_data_file_path
innodb_file_per_table
innodb_buffer_pool_size
Recently, a customer wondered if they should start using the innodb_file_per_table option, or if they should continue to use the large InnoDB tablespace files created by the innodb_data_file_path option in the my.cnf option file.
Many people still use the older innodb_data_file_path option because it is the default for MySQL server. So, what are the benefits of using innodb_file_per_table instead?
The innodb_file_per_table makes for easier-to-manage files. With this option each InnoDB table has its own data and index file under the database directory. As an example, if you had table foo located in database xyz the InnoDB data file for table foo would be /var/lib/mysql/data/xyz/foo.idb. Each table would have its own idb table in the appropriate database directory. This is in contrast to using the innodb_data_file_path option with (typically) one large file in the root of your data directory. For example, it might be /var/lib/mysql/data/ibdata1.idb. All table data and indexes would be stored in this one file, and it can be very large and unwieldy. I don’t recall the largest ibdata file I have seen, but what do you do if you have a 100 gig InnoDB tablespace file? It can, and does, happen. The file contains what amounts to all the data of all your databases on the server.
First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2′. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.
Few migration notes - once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY - which pads data with \0 bytes. Solution was converting everything into VARBINARY - as actually it doesn’t have much overhead. TRIM('\0' FROM field) eventually helped too.
The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately - though it had few more queries remaining in the queue - so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…
I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% - so I felt nearly like I’ve done the good thing. Interesting though - profile showed that quite a lot of CPU time was spent in synchronization - mutexes and such - so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.
Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread - and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.
Still, there’re few problems with the concept - it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless - as all buffer pool pages required would be already in memory - thus leading to less spots of mutex contention.
We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not - file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.
Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.
I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)
I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)
InnoDB maintains two checksums per buffer pool block. Old formula of checksum, and new formula of checksum. Both are read, both are written. I guess this had to be some kind of transition period, but it obviously took too long (or was forgotten). Anyway, disabling checksums code entirely makes single-thread data load 7% faster - though in parallel activity locking contention provides with some extra CPU resources for checksum calculation.
Leaving just single version of checksum would cut this fat in half, without abandoning the feature entirely - probably worth trying.
Update: Benchmarked InnoDB checksum against Fletcher. Results were interesting (milliseconds for 10000 iterations):
| Algorithm: | InnoDB | Fletcher |
| - | 826 | 453 |
| -O2: | 316 | 133 |
| -O3: | 42 | 75 |
So, though using Fletcher doubles the performance, -O3 optimizes InnoDB checksumming much better. How many folks do run -O3 compiled mysqld?
If implementing ‘SHOW INNODB LOCKS’ took half an hour (literally, includes compiling and testing) for a non-developer type of guy (cause most of code was written anyway), why the heck it takes ten years (or more?) to get such feature to standard release?