» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with InnoDB + MySQL

Notes from land of I/O

A discussion on IRC sparkled some interest on how various I/O things work in Linux. I wrote small microbenchmarking program (where all configuration is in source file, and I/O modes can be changed by editing various places in code ;-), and started playing with performance.

The machine for this testing was RAID10 16disk box with 2.6.24 kernel, and I tried to understand how O_DIRECT works, and how fsync() works and ended up digging into some other stuff.

My notes for now are:

  • O_DIRECT serializes writes to a file on ext2, ext3, jfs, so I got at most 200-250w/s.
  • xfs allows parallel (and out-of-order, if that matters) DIO, so I got 1500-2700w/s (depending on file size - seek time changes.. :) of random I/O without write-behind caching. There are few outstanding bugs that lock this down back to 250w/s (#xfs@freenode: “yeah, we drop back to taking the i_mutex in teh case where we are writing beyond EOF or we have cached pages”, so
    posix_fadvise(fd, 0, filesize, POSIX_FADV_DONTNEED)

    helps).

  • fsync(),sync(),fdatasync() wait if there are any writes, bad part - it can wait forever. Filesystems people say thats a bug - it shouldn’t wait for I/O that happened after sync being called. I tend to believe, as it causes stuff like InnoDB semaphore waits and such.

Of course, having write-behind caching at the controller (or disk, *shudder*) level allows filesystems to be lazy (and benchmarks are no longer that different), but having the upper layers work efficiently is quite important too, to avoid bottlenecks.

It is interesting, that write-behind caching isn’t needed that much anymore for random writes, once filesystem parallelizes I/O, even direct, nonbuffered one.

Anyway, now that I found some of I/O properties and issues, should probably start thinking how they apply to the upper layers like InnoDB.. :)

MySQL: Planet MySQL

Why You Want to Switch to MySQL 5.1

In two words: online operations. In a paragraph: Forget partitioning, row-based replication and events. The big reasons most people are going to salivate over 5.1, and probably start plans to upgrade now, are the online operations:

  • online ALTER TABLE for column rename, column default value change, and adding values to the end of an ENUM/SET
  • Online, table-based logging. No more need to restart your server to enable or change the general or slow query logs. You can have the standard file-based output or choose a table format…which you can query.
  • (more…)

MySQL: Planet MySQL

Making MySQL more usable: InnoDB information_schema patches

Lately I’ve been working on a series of patches to increase the usability of InnoDB, based on the InnoDB plugin for MySQL 5.1. I have a lot of ideas still, but I’ve currently gotten two proof of concept patches working. Both of the patches require a small patch to relocate the buf_chunk_struct struct definition which was inadvertently defined in buf0buf.c rather than buf0buf.h.

Most patches are considered rough draft, proof of concept quality at best. They are suitable for testing, playing around, using them to gain insight about your data, but not for production use. Use at your own risk!

Using information_schema to view the buffer pool contents

This patch creates several new information_schema tables to provide visibility into the contents of the InnoDB buffer pool. The new tables are:

  • innodb_buffer_pool_pages
  • innodb_buffer_pool_pages_index
  • innodb_buffer_pool_pages_blob

Note that accessing these tables is not cheap, as the buffer pool must be scanned.

innodb_buffer_pool_pages

This table provides some generic information about all pages present in the buffer pool, and is primarily useful for getting statistics about the number and types of pages present at any given point in time. For example:

SELECT
  page_type,
  count(*) AS nr
FROM information_schema.innodb_buffer_pool_pages
GROUP BY page_type;

+-----------+------+
| page_type | nr   |
+-----------+------+
| allocated | 4051 |
| bitmap    |    2 |
| blob      |    5 |
| fsp_hdr   |    2 |
| index     |   14 |
| inode     |    2 |
| sys       |    3 |
| trx_sys   |    1 |
| undo_log  |   15 |
+-----------+------+

innodb_buffer_pool_pages_index

This table gives you detailed statistics on index pages, which for InnoDB means all data (as part of the primary key) and all defined indexes. For example (some columns removed for clarity):

SELECT *
FROM information_schema.innodb_buffer_pool_pages_index;

+-------------+------------------+------------+----------+---------+--------+-----------+--------------+...
| schema_name | table_name       | index_name | space_id | page_no | n_recs | data_size | lru_position |...
+-------------+------------------+------------+----------+---------+--------+-----------+--------------+...
| NULL        | SYS_IBUF_TABLE_0 | CLUST_IND  |        0 |       4 |      0 |         0 |            6 |...
| NULL        | SYS_INDEXES      | CLUST_IND  |        0 |      11 |      8 |       536 |            7 |...
| NULL        | SYS_TABLES       | CLUST_IND  |        0 |       8 |      5 |       351 |           12 |...
| NULL        | SYS_COLUMNS      | CLUST_IND  |        0 |      10 |     24 |      1436 |           14 |...
| NULL        | SYS_TABLES       | ID_IND     |        0 |       9 |      5 |       126 |           15 |...
| NULL        | SYS_FIELDS       | CLUST_IND  |        0 |      12 |      8 |       315 |           16 |...
| NULL        | SYS_FOREIGN      | ID_IND     |        0 |      46 |      0 |         0 |           50 |...
| NULL        | SYS_FOREIGN      | FOR_IND    |        0 |      47 |      0 |         0 |           51 |...
| NULL        | SYS_FOREIGN      | REF_IND    |        0 |      48 |      0 |         0 |           52 |...
| NULL        | SYS_FOREIGN_COLS | ID_IND     |        0 |      49 |      0 |         0 |           53 |...
| test        | bt               | PRIMARY    |       26 |       3 |      3 |        39 |           66 |...
| test        | bt               | PRIMARY    |       26 |       7 |      1 |      8124 |           69 |...
| test        | bt               | PRIMARY    |       26 |       9 |      2 |     16248 |           71 |...
| test        | bt               | PRIMARY    |       26 |       6 |      2 |     16248 |           74 |...
+-------------+------------------+------------+----------+---------+--------+-----------+--------------+...

Besides the obvious value of this, notice that you can see InnoDB’s internal tables (with schema_name as NULL) as well!

innodb_buffer_pool_pages_blob

This table gives you detailed statistics on BLOB-storage pages, which store parts of BLOBs that overflow in-row storage (another post on that later, but suffice it to say, if the BLOB is larger than 8KB… sort of). For example:

SELECT *
FROM information_schema.innodb_buffer_pool_pages_blob;

+----------+---------+------------+----------+--------------+--------------+-----------+------------+
| space_id | page_no | compressed | part_len | next_page_no | lru_position | fix_count | flush_type |
+----------+---------+------------+----------+--------------+--------------+-----------+------------+
|       26 |       5 |          0 |     9232 |            0 |           72 |         0 |          0 |
|       26 |      10 |          0 |     9232 |            0 |           73 |         0 |          0 |
|       26 |       8 |          0 |     9232 |            0 |           75 |         0 |          0 |
|       26 |      11 |          0 |     9232 |            0 |           76 |         0 |          0 |
|       26 |       4 |          0 |     9232 |            0 |           77 |         0 |          0 |
+----------+---------+------------+----------+--------------+--------------+-----------+------------+

We’d love to get the table name and PRIMARY KEY value associated with each BLOB-storage page, but I’m not sure if that’s possible.

An example of the power of these tables

Have you ever wondered what your page fill rate is? Probably not, because you didn’t realize it was either interesting or important, but we worry about these things (and often justifiably so). In short, page fill rate is the percentage of each page that is filled with data. Since InnoDB only caches whole pages, if your average page fill rate is 75%, you would be wasting 25% of the space in each page. That would mean that your table is spread across more pages than necessary, which will waste both disk space and memory you’ve allocated to the buffer pool.

Here’s a simple way to figure out the average page fill rate of pages currently in cache:

SELECT
  schema_name,
  table_name,
  index_name,
  COUNT(*) AS n_pages,
  ROUND(AVG(n_recs), 2) AS recs_per_page,
  ROUND(SUM(data_size)/1048576, 2) AS total_data_size_M,
  ROUND(AVG(data_size/(
    SELECT variable_value
    FROM information_schema.global_status
    WHERE variable_name
    LIKE 'innodb_page_size'
  )) * 100, 2) AS page_fill_pct
FROM
  information_schema.innodb_buffer_pool_pages_index
WHERE 1
AND schema_name IS NOT NULL
GROUP BY
  schema_name,
  table_name,
  index_name
ORDER BY
  total_data_size_M DESC;

+-------------+------------+-----------------+---------+---------------+-------------------+---------------+
| schema_name | table_name | index_name      | n_pages | recs_per_page | total_data_size_M | page_fill_pct |
+-------------+------------+-----------------+---------+---------------+-------------------+---------------+
| test        | n          | GEN_CLUST_INDEX |    4079 |        252.55 |             58.56 |         91.88 |
| test        | bt         | PRIMARY         |       2 |          2.50 |              0.02 |         49.70 |
| test        | p          | PRIMARY         |       1 |          2.00 |              0.00 |          0.52 |
| test        | p          | c               |       1 |          2.00 |              0.00 |          0.37 |
+-------------+------------+-----------------+---------+---------------+-------------------+---------------+

This is showing some statistics for each index from each table in the cache: the number of pages in the cache, the average number of records per page for those cached pages, the total number of megabytes of cache used, and the page fill rate in percent. There is currently no other way to calculate this information from any statistics or data that InnoDB provides.

What else?

What ideas do you have? What do you think about these features? Leave a comment to let us know!

MySQL: Planet MySQL

MySQL Back to Basics: Analyze, Check, Optimize, and Repair

It felt like the right time for us to look back at some useful commands for table maintenance that some of us may not have mastered as much as we might like to think.

In my post about gathering index statistics, I referred to OPTIMIZE TABLE, ANALYZE TABLE, and REPAIR TABLE — but I never explained in depth what the different commands do, and what the differences between them are. That is what I thought I would do with this post, focusing on InnoDB and MyISAM, and the differences in how they treat those commands. I will also look at different cases and see which one is right for in each case.

(more…)

MySQL: Planet MySQL

Innodb RAID performance on 5.1

I've been doing some benchmarking recently to satisfy the curiosity about 5.1's performance compared with 4.1.  The major question this time revolves around how much additional performance an external RAID array can provide (for us it's typically beyond the 6 drives a Dell 2950 can hold). 


These tests are done on using an MSA-30 drive enclosure with 15k-SCSI drives.  The testing framework is sysbench oltp.  The test names are hopefully fairly obvious:  selects = single selects, reads = range tests, xacts = transaction tests, etc.   Transaction tests are counting individual queries, not transactions.   The "Rdm" tests are using a uniform distribution, whereas the non-'Rdm' tests are 75% of queries are using 10% of the rows.  

read more

MySQL: Planet MySQL

Tuning Search In Drupal 5

In previous search benchmarks, I utilized random content generated with Drupal's devel module. In these latest benchmarks, I used an actual sanitized copy of the Drupal.org community website database, with email addresses and passwords removed. The first tests were intended to confirm that Xapian continues to perform well with large amounts of actual data. Additional tests were performed to measure the effect of various MySQL tunings and configurations. The following data was derived from several hundred benchmarks run on an Amazon AWS instance over the past week using the SearchBench module.

These tests confirm that Xapian continues to offer better search performance than Drupal's core search module. Contrary to popular belief, the data also shows that using the InnoDB storage engine for search tables significantly outperforms using the MyISAM storage engine for search tables, especially when your database server has sufficient RAM. The data also confirms that allocating additional RAM for MySQL's temporary tables can also improve search performance.

read more

MySQL: Planet MySQL

Innodb Multi-core Performance

There's been a lot of rumors floating around internally at Yahoo that it's best to turn off some of your CPU cores when using Innodb, especially if you have a machine with > 4 cores.  At this point there's no question in my mind that Innodb doesn't perform much better when you double your cores from 4 to 8, but I really wanted to know if 8 actually performed worse. 


To test, I used a Dell 2950 with 6 drives and a simple mysqlslap test script.  There's basically no I/O going on here, just a small table in memory being queried a lot.  To be fair, I actually got this test from Venu.  I used maxcpu=4 in my grub.conf to limit the cpus (I also tested with tasksel and it seemed to have the same effect as maxcpu).

read more

MySQL: Planet MySQL

Falcon Transactional Characteristics

It’s time to continue our series on the transactional storage engines for MySQL. Some might question why I even include Falcon because it is very much beta at this time. MySQL, however, has made quite an investment into Falcon, and while it is currently beta, the code is improving and it looks like that it will be production-worthy when MySQL server 6.0 hits GA.

If this is the case, it is important to begin to understand what Falcon was designed for and how it differs from other transactional engines such as InnoDB. I am going to concentrate quite a bit on the Falcon/InnoDB comparison as that is what everyone wants to talk about. This is despite my having heard MySQL employees repeatedly make statements to the effect of, “Falcon is not going to replace InnoDB,” or “Falcon is not competing with InnoDB.” Well, take that with a grain of salt. It certainly seems to me that they are competing for the same spot.

Warning

As I said, Falcon is beta. First off, don’t even try to use it in production. Using it in production means you will also be using MySQL Server 6.0, which itself is considered alpha. Your data will explode, be corrupted, or eaten by jackals. It won’t be pretty. It will cause great pain.

In addition, the features of Falcon are still changing. What I say here might or might not be accurate in the future.

End of Warning

So, why was Falcon even created?

(more…)

MySQL: Planet MySQL

InnoDB Transactional Characteristics

InnoDB is a storage engine that uses MVCC (described shortly) to provide ACID-compliant transactional data storage using row-level locking.  MVCC stands for Multi-Version Concurrency Control.  It is how InnoDB allows multiple transactions to look at a data set of one or more tables and have a consistent view of the data. MVCC keeps a virtual snapshot of the dataset for each transaction.  An example will make this clear.

Let’s assume you have two transactions (and only two transactions) running on a system. If transaction A starts at 10:45:56 and ends at 10:45:89, it gets a consistent view of the dataset during the time that the transaction runs.  If transaction B starts at 10:45:65, it would see exactly the same view of the dataset that transaction A saw when it began the transaction.  If transaction B started at 10:45:95, it would see the modified dataset after transaction A made modifications. During the duration of each transaction, the dataset that each sees does not change, except for the modifications the transaction itself makes.  Consider that a typical production database server is running hundreds of queries a second, and you realize that the job of MVCC/the InnoDB storage engine gets very complicated maintaining all these views of the data.

MySQL server storage engines use three different locking options: table-level locking, page-level locking, and row-level locking.  With table-level locking, if a query accesses the table it will lock the entire table and not allow access to the table from other queries.  The benefit of this is that it entirely eliminates deadlocking issues. The disadvantage is that, as mentioned, no other queries have access to the table while it is locked.  If you had a table with 16,000,000 rows and needed to modify one row, the entire table is inaccessible by other queries. The MyISAM and memory storage engine use table-level locking.

Page-level locking is locking of a group of rows instead of a the entire table. The number of rows actually locked will vary based on a number of factors. Going back to our example of a 16,000,000-row table, lets assume a page-level lock is used.  If a page consists of 1,000 rows (this would vary depending on the size of the rows and the actual amount of memory allocated to a page), a lock would lock only a thousand rows.  Any of the other 15,999,000 rows could be used by other queries without interference. The BDB storage engine uses page-level locking.

Row-level locking, as the name suggests, acquires a lock on as small an amount as a single row from a table. This will block the minimal amount of table content and allows for the most concurrency on a table without problems. InnoDB and Falcon both use row-level locking.

While the InnoDB configuration options are not strictly related to the transactional characteristics (other than innodb_flush_log_at_trx_commit), I thought it would be useful to have them here for reference.  These are the most common or most important configuration parameters:

(more…)

MySQL: Planet MySQL

Differences Between innodb_data_file_path and innodb_file_per_table

Recently, a customer wondered if they should start using the innodb_file_per_table option, or if they should continue to use the large InnoDB tablespace files created by the innodb_data_file_path option in the my.cnf option file.

Many people still use the older innodb_data_file_path option because it is the default for MySQL server. So, what are the benefits of using innodb_file_per_table instead?

The innodb_file_per_table makes for easier-to-manage files. With this option each InnoDB table has its own data and index file under the database directory. As an example, if you had table foo located in database xyz the InnoDB data file for table foo would be /var/lib/mysql/data/xyz/foo.idb. Each table would have its own idb table in the appropriate database directory. This is in contrast to using the innodb_data_file_path option with (typically) one large file in the root of your data directory. For example, it might be /var/lib/mysql/data/ibdata1.idb. All table data and indexes would be stored in this one file, and it can be very large  and unwieldy. I don’t recall the largest ibdata file I have seen, but what do you do if you have a 100 gig InnoDB tablespace file? It can, and does, happen. The file contains what amounts to all the data of all your databases on the server.

(more…)

MySQL: Planet MySQL

5.0 journal: various issues, replication prefetching, our branch

First of all, I have to apologize about some of my previous remark on 5.0 performance. I passed ‘-g’ CFLAGS to my build, and that replaced default ‘-O2′. Compiling MySQL without -O2 or -O3 makes it slower. Apparently, much slower.

Few migration notes - once I loaded the schema with character set set to binary (because we treat it as such), all VARCHAR fields were converted to VARBINARY, what I expected, but more annoying was CHAR converted to BINARY - which pads data with \0 bytes. Solution was converting everything into VARBINARY - as actually it doesn’t have much overhead. TRIM('\0' FROM field) eventually helped too.

The other problem I hit was paramy operation issue. One table definition failed, so paramy exited immediately - though it had few more queries remaining in the queue - so most recent data from some table was not inserted. The cheap workaround was adding -f option, which just ignores errors. Had to reload all data though…

I had real fun experimenting with auto-inc locking. As it was major problem for initial paramy tests, I hacked InnoDB not to acquire auto-inc table-level lock (that was just commenting out few lines in ha_innodb.cc). After that change CPU use went to >300% instead of ~100% - so I felt nearly like I’ve done the good thing. Interesting though - profile showed that quite a lot of CPU time was spent in synchronization - mutexes and such - so I hit SMP contention at just 4 cores. Still, the import was faster (or at least the perception), and I already have in mind few cheap tricks to make it faster (like disabling mempool). The easiest way to make it manageable is simply provide a global variable for auto-inc behavior, though elegant solutions would attach to ‘ALTER TABLE … ENABLE KEYS’ or something similar.

Once loaded, catching up on replication was another task worth few experiments. As the data image was already quite a few days old, I had at least few hours to try to speed up replication. Apparently, Jay Janssen’s prefetcher has disappeared from the internets, so the only one left was maatkit’s mk-slave-prefetch. It rewrites UPDATEs into simple SELECTs, but executes them just on single thread, so the prefetcher was just few seconds ahead of SQL thread - and speedup was less than 50%. I made a quick hack that parallelized the task, and it managed to double replication speed.

Still, there’re few problems with the concept - it preheats just one index, used for lookup, and doesn’t work on secondary indexes. Actually analyzing the query, identifying what and where changes, and sending a select with UNIONs, preheating every index affected by write query could be more efficient. Additionally it would make adaptive hash or insert buffers useless - as all buffer pool pages required would be already in memory - thus leading to less spots of mutex contention.

We also managed to hit few optimizer bugs too, related to casting changes in 5.0. Back in 4.0 it was safe to pass all constants as strings, but 5.0 started making poor solutions then (like filesorting, instead of using existing ref lookup index, etc). I will have to review why this happens, does it make sense, and if not - file a bug. For now, we have some workarounds, and don’t seem to be bitten too much by the behavior.

Anyway, in the end I directed half of this site’s core database off-peak load to this machine, and it was still keeping up with replication at ~8000 queries per second. The odd thing yet is that though 5.0 eats ~30% more CPU, it shows up on profiling as faster-responding box. I guess we’re just doing something wrong.

I’ve published our MySQL branch at launchpad. Do note, release process is somewhat ad-hoc (or non-existing), and engineer doing it is clueless newbie. :)

I had plans to do some more scalability tests today, but apparently the server available is just two-core machine, so there’s nothing much I can do on it. I guess another option is grabbing some 8-core application server and play with it. :)

MySQL: Planet MySQL

On checksums

InnoDB maintains two checksums per buffer pool block. Old formula of checksum, and new formula of checksum. Both are read, both are written. I guess this had to be some kind of transition period, but it obviously took too long (or was forgotten). Anyway, disabling checksums code entirely makes single-thread data load 7% faster - though in parallel activity locking contention provides with some extra CPU resources for checksum calculation.
Leaving just single version of checksum would cut this fat in half, without abandoning the feature entirely - probably worth trying.

Update: Benchmarked InnoDB checksum against Fletcher. Results were interesting (milliseconds for 10000 iterations):

Algorithm: InnoDB Fletcher
- 826 453
-O2: 316 133
-O3: 42 75

So, though using Fletcher doubles the performance, -O3 optimizes InnoDB checksumming much better. How many folks do run -O3 compiled mysqld?

MySQL: Planet MySQL

InnoDB map: bulk inserting

So, what does InnoDB do while you’re loading the data in parallel?
It looks something like this:

Click on image for larger (8MB) version, if you dare, have efficient browser and lots of spare RAM (7k*10k canvas). Generated by:

opreport | Gprof2Dot | dot

MySQL: Planet MySQL

MySQL: How do you install innotop to monitor innodb in real time?

Innotop is a very useful tool to monitor innodb information in real time. This tool is written by Baron Schwartz who is also an author of “High Performance MySQL, Second edition” book. [Side note: I highly recommend getting this book when it comes out (in June, 08?). Other authors include: Peter Zaitsev, Jeremy [...]

MySQL: Planet MySQL

SHOW INNODB LOCKS

If implementing ‘SHOW INNODB LOCKS’ took half an hour (literally, includes compiling and testing) for a non-developer type of guy (cause most of code was written anyway), why the heck it takes ten years (or more?) to get such feature to standard release?

MySQL: Planet MySQL

Sample my.cnf file for InnoDB databases

Brian Moon suggest that community provided example my.cnf files would be a great thing to have on MySQLforge in this recent post: http://doughboy.wordpress.com/2008/05/06/example-mycnf-files/

I pulled out the "innodb heavy" config sample file and modified it with the standard settings that I typically start with when setting up a new InnoDB master. I've also modified the comments in the file a bit and have added some of my own too. I removed the sample slave configuration parameters (master-host, etc) because you should be using 'CHANGE MASTER TO'.

He suggested tagging such files with a 'mycnf' tag and very kindly tagged mine after I posted it :)

Feel free to share yours too and please feel free to make any comments about my configuration choices.

You can find it here (along with any other mycnf tagged files):
http://forge.mysql.com/tools/search.php?t=tag&k=mycnf

MySQL: Planet MySQL

InnoDB not releasing a row lock?

This is a bit surprise when we encountered a case where InnoDB is not releasing its row lock when there is an error condition within the transaction. And I verified with Falcon, Oracle, SQL Server and Sybase; all seemed to work as expected.

For example; just open a transaction in a session and execute a error statement (lets say duplicate key) and on the other new session try to get a row lock on the same record (use where clause with FOR UPDATE) and you will notice that InnoDB blocks on this statement until you issue a explicit rollback or commit. But remember there is nothing happened on the first session other than duplicate error on that row. So, InnoDB should implicitly unlock the row when there is an error; and looks like it is not doing that.

Here is the scenario:

First create a single column table and populate some rows (lets say 20 rows in this case) on any version of MySQL/InnoDB.

mysql> create table t1(c1 int not null auto_increment primary key)Engine=InnoDB;
Query OK, 0 rows affected (0.14 sec)

mysql> insert into t1 values(),(),(),(),(),(),(),(),(),();
Query OK, 10 rows affected (0.12 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> insert into t1 values(),(),(),(),(),(),(),(),(),();
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

and then execute the following statements; first in session-I and then in session-II and notice that session-II select statement hangs till you explicitly release the transaction in session-I.

Session-I Session-II
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values(10);
ERROR 1062 (23000): Duplicate entry ‘10′ for key ‘PRIMARY’

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where c1=10 for update;

Even though am not really sure whether this is a bug or feature in InnoDB (I suspect this as a bug); but Oracle, SQL Server or Sybase will not block the select and releases the lock on the duplicate error. Even Falcon engine in MySQL 6.0 does seem to release the lock appropriately.

MySQL: Planet MySQL

News flash: MySQL 5.1 has zero bugs

Zack Urlocker says MySQL 5.1 has zero bugs. He may have been misquoted, or quoted out of context, but there it is. I’ll quote enough of it that you can’t take it out of context twice:

Mickos also said MySQL 5.1 has upgraded its reliability and ease of use over 2005’s v5.0.

“Now we can admit it, but this version is much improved over 5.0, which we weren’t totally happy with,” Mickos confided.

He reported that more than 1,300 bugs (997 in 2007, 386 so far in 2008) have been fixed in v5.1, and that, according to standard DBT2 benchmarks, the performance of v5.1 is 10 to 15 percent better than the previous version.

“This version now has zero bugs,” Urlocker told eWEEK.

You can check for yourself at the MySQL bug statistics page.

Of course it’s not true. But what did Zack really say, I wonder?

, , , ,

MySQL: Planet MySQL

InnoDB plugin row format performance

Here is a quick comparison of the new InnoDB plugin performance between different compression, row formats that is introduced recently.

The table is a pretty simple one:

CREATE TABLE `sbtest` (
  `id` int(10) unsigned NOT NULL,
  `k` int(10) unsigned NOT NULL DEFAULT ‘0′,
  `c` char(120) NOT NULL DEFAULT ,
  `pad` char(60) NOT NULL DEFAULT ,
  PRIMARY KEY (`id`),
  KEY `k` (`k`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

The table is populated with 10M rows with average row length being 224 bytes. The tests are performed for Compact, Dynamic and Compressed (8K and 4K)  row formats using MySQL-5.1.24 with InnoDB plugin-1.0.0-5.1 on Dell PE2950  1x Xeon quad core with 16G RAM, RAID-10 with RHEL-4 64-bit.

Here are the four test scenarios:

  1. No compression, ROW_FORMAT=Compact
  2. ROW_FORMAT=Compressed with KEY_BLOCK_SIZE=8
  3. ROW_FORMAT=Compressed with KEY_BLOCK_SIZE=4
  4. ROW_FORMAT=Dynamic

All the above tests are repeated with innodb_buffer_pool_size=6G and 512M to make sure one fits everything in memory and another one overflows. The rest of the InnoDB settings are all default except that innodb_thread_concurrency=32.

Here is the summary of the test results:

Table Load:

Load time from a dump of SQL script having 10M rows (not batched)

Compact Compressed (8K) Compressed (4K) Dynamic
28m 18s 29m 46s 36m 43s 27m 55s

File Sizes:

Here is the size of the .ibd file after each data load

Compact Compressed (8K) Compressed (4K) Dynamic
2.3G 1.2G 592M 2.3G

Data and Index Size from Table Status:

Here is the Data and Index size in bytes from SHOW TABLE STATUS and you can see the original data size here rather than the compressed size

  Compact Compressed (8K) Compressed (4K) Dynamic
Data 2247098368 2247098368 2249195520 2247098368
Index 137019392 137035776 160301056 137019392

Compression Stats:

Here is the compression stats after the table is populated from information_schema.InnoDB_cmp; and you notice that 4K takes more operations and time for both compression and un-compression

  Page_size Compress_ops Compress_ops_ok Compress_time Uncompress_ops Uncompress_time
8K 8192 446198 445598 73 300 0
4K 4096 1091421 1012917 463 38801 13

Performance:

Here is the performance of various row formats with threads ranging from 1-512 for both 512M and 6G buffer pool size for both concurrent reads and writes.

compress512m

compress6g

Observations:

Few key observations from the performance tests that I performed without looking to any of the sources, as I could be wrong, someone can correct me here. Its hard to draw from these input scenarios, but helps to estimate what is what.

  • The load time is almost same except that the 4K compression seems to take longer than the rest; and compression in general is hitting the INSERT/Load performance a little bit.
  • Compact or Dynamic, there is no compression; so the data and index file sizes will be almost same
  • The SHOW TABLE STATUS for compressed table will have its original Data_Length and Index_Length statistics rather than the compressed statistics (may be a bug or InnoDB needs to extend SHOW TABLE STATUS to show any compressed sizes or other means, right now only option is to view your files manually)
  • 8K compression reduced the .ibd file by nearly 50% (1.2G out of 2.3G) and 4K compression reduced the size by 1/4th (592M out of 2.3G); and it could vary based on table types and data.
  • 8K compression takes less ops and time for both compression and de-compression when compared to 4K (obvious)
  • When there is enough Innodb buffer pool size to act data in memory, the compression is a bit overhead, but you will be saving space
  • When there is a overflow from buffer pool (IO bound), compression seems to really help
  • 4K compression in general seems to be slower when compared with 8K or any other row_format.

MySQL: Planet MySQL

MySQL Engines: MyISAM vs. InnoDB

Why use InnoDB?

InnoDB is commonly viewed as anything but performant, especially when compared to MyISAM. Many actually call it slow. This view is mostly supported by old facts and mis-information. In reality, you would be very hard-pressed to find a current, production-quality MySQL Database Engine with the CPU efficiency of InnoDB. It has its performance "quirks" and there are definitely workloads for which it is not optimal, but for standard OLTP (Online Transaction Processing) loads, it is tough to find a better, safer fit.

read more

MySQL: Planet MySQL

Notes from Falcon from the beginning

Here is the quick notes from the session Falcon from the beginning by Jim Starkey and Ann Harrison

  • Why Falcon
    • Hardware is evolving rapidly, world is changing, so taking advantage
    • Customers need ACID transactions
  • Where hardware is going
    • CPUS breed like rabbits (more sockets, cores, threads/core)
    • Memory is bigger, faster and cheaper
    • Disks are bigger and cheaper but not much faster
    • In general boxes are getting cheaper
  • Where applications are going
    • batch - dead
    • timesharing - dead
    • departmental computing - dead
    • client server - fading fast
    • application servers for most of us
    • web services for the really big buys
  • Database Challenges
    • Traditional challenge
    • exhaust CPU, memory and disk simultaneously
  • Tradeoffs
    • use memory to page cache to avoid disk reads
    • record cache to avoid page cache manipulation
    • use CPU to find the fastest path to record
    • use CPU to minimize record size
    • Synchronize most data structures with user mode read/write locks
    • Synchronize high contention data structures with interlocked instructions
  • Architecture
    • Incomplete in-memory db with disk backfill
    • Multi-version concurrency control in memory
    • Updates in memory until commit
    • Group commits to a single serial log write
    • post-commit multi-threaded pipe line to move updates to disk
  • Incomplete in-memory database
    • records cached in memory
    • separate cache for disk pages
    • record cache hits 15% the cost of a page cache hit
    • record cache is more memory efficient than page cache
  • Record Encoding - cache efficiency
    • records encoded by value, not declaration
    • string “abc” occupies the same space in varchar(3) or varchar(4096)
    • the number 7 is the same where small, medium, int, bigint, decimal or numeric
  • MVCC
    • update ops create new record versions
    • new one is tagged with id, points to old version
    • keep tracks which
  • Updates are in memory
    • held in memory pending commit
    • index changes held in memory
    • verb rollback is dirt cheap
    • trxs rollback is dirt cheap
  • At commit
    • pending record updates flushed to serial log
    • pending index updates flushed to serial log
    • commit record written to serial log
    • serial log flushed to the oxide
    • and trx is also committed
  • Memory is infinite, so
    • large txns chills uncommitted data (flushes it to the log early)
    • chilled records can be thawed
    • scavenger garbage collects unloved records periodically
    • when things get really had, entire record chains flushed to backlog
  • Weakness
    • transactions are ACID but not serializable
    • latency advantage disappears at saturation
    • very large transactions degrade performance
    • optimized for web, not batch
  • Strengths
    • runs like a memory db when data fits
    • scales like disk-based db when db doesn’t fit in cache
    • lowest possible latency for web apps
    • absorbs huge spiky loads
  • Performance
    • benchmark against InnoDB vs Falcon only
    • DBT2 benchmark (what about sysbench?)
    • High contention
    • Writes intensive - 40% records touched are updated
    • measures only performance at saturation
  • DBT2 is InnoDB’s best spot and Falcon’s worst, so do not take benchmark results, decide on what you want
  • When should you use what ?
    • don’t need ACID ? then MyISAM is good
    • single processor, small memory - InnoDB is good
    • large transactions, batch inserts/updates, InnoDB is good
    • multi cores, more memory, more threads , use Falcon
    • For web, Falcon is hard to beat

MySQL: Planet MySQL

InnoDB 1.0: “Fast index creation: add or drop indexes without copying the data”

Because the write locking happens at data copy time. If this pans out looks like they’ll have managed to stay focused post-Oracle acquisition.

Kellan-Elliot-Mcrea: Laughing Meme

MySQL Conference and Expo 2008, Day One

Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:

New Subquery Optimizations in 6.0

By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.

The Lost Art Of the Self Join

This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.

He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.

EXPLAIN Demystified

This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.

Someone did an audio recording of the session, but I don’t recall who it was.

Investigating InnoDB Scalability Limits

This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.

Replication Tricks and Tips

Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)

BoF Sessions

I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.

Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:

Me: I like it. My initials are BS.

Paul: BS actually means British Standard, so it can’t be bad.

Me: Better than American Standard. That’s a toilet.

We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.

, , , , , , , , , , , , , , ,

MySQL: Planet MySQL

Storage Engines at the MySQL Conference

I’ll be following closely the progression of Storage Engines available in the MySQL Database server, well soon to be available when 5.1 gets to GA (hopefully by end of Q2 which is what we have been told). Tick, Tick, time is running out.

PrimeBase XT (PBXT) and Blob Streaming is obviously my clear focus, actually now working for PrimeBase Technologies, the company which I want to note for people is an Open Source company, committed at providing an open source alternative to the other commercial players. You also have at the MySQL Conference talks on the the existing InnoDB from Innobase (a subsidiary of market RDBMS leader Oracle). There is a Nitro presentation, an Infobright presentation, no Solid presentation surprisingly (the IBM news happening after submissions closed). We also have from MySQL, presentations on the internally developed storage engines Falcon and Maria, both products that won’t even be in 5.1 but 6.0, however Maria is presently a different branch of 5.1 so I don’t know how that works. Will it be in 5.1?

But what I want to seek is more news of KickFire, a Diamond Sponsor, an engine with embedded H/W, something that’s been obviously worked on in reasonable stealth. For me it’s not just interesting, it’s a competitor in our technology space, so I’ve been researching Joseph Chamdani and some of his patents.

Plenty of news in the past few weeks on Kickfire including Kickfire Update by Keith Murphy on April 3, Kickfire: stream-processing SQL queries by Baron Schwartz on April 4, Kickfire looking to push MySQL limits by Farhan Mashraqi on April 4, and Kickfire Kickfire Kickfire by Peter Zaitsev on April 4, and myself back on March 23.

So what can I make from the lack of company information and posted information to date.

  • Hardware based acceleration.
  • No Solid State Drive (SSD) Technology, at least not yet but C2App mentions SSD.
  • Data Warehousing, lending to thinking it’s not a transactional storage engine
  • A new storage engine and a new approach to data storage. I find this surprising, as it takes years to develop a feature complete storage engine, and most new 5.1 storage engines are indeed existing products, take Nitro, Solid, Infobright and Falcon. Only PBXT has been written from the ground up for MySQL 5.1, so looking to know more about it’s development
  • Expensive, it’s dedicated H/W + (assuming) MySQL Enterprise + Storage Engine

MySQL: Planet MySQL

The scoop on Innodb reads

I spent some time going through the source the other day in order to try to understand the difference between these SHOW STATUS variables in 5.x:


Innodb_buffer_pool_read_requests
Innodb_buffer_pool_reads
Innodb_pages_read

Here's what I discovered that I'm not sure is 100% right, but I suspect is at least on the right track:

Innodb_buffer_pool_read_requests:  requests to get something from the buffer pool.  This isn't really that great of a mystery.  I suspect these are 'direct' read requests, meaning those caused directly from a query of some kind.  

read more

MySQL: Planet MySQL

STATUS variables: the difference between innodb_pages_read and innodb_buffer_pool_reads

I haven't gotten any responses to my question from Friday, so I figured I'd post a more direct question:


The two variables 'innodb_pages_read' and 'innodb_buffer_pool_reads' are different.  'Innodb_pages_read' seems consistently higher than the buffer_pool_reads.  Certainly the buffer pool reads 'pages', but what else would be counted as a page read?


On the flip side, 'innodb_pages_written' and 'innodb_buffer_pool_pages_flushed' seem consistent.  

Anyone have any clues?

MySQL: Planet MySQL

It's all about the metrics, but what do they mean?

I'm working on an update to my myq_gadgets package for 5.1, particularly the innodb stats tracker.  I'm liking the additional stats that are in SHOW STATUS now for Innodb, particularly because I don't need to parse SHOW INNODB STATUS anymore.  I'm in love with being able to compare logical to physical I/Os for the buffer pool.  


I'm a bit confused, however, as some of my numbers don't quite line up.  Here's what I've got so far:

            | Innodb Engine       Buffer Pool                        Data                     Log       Lock
Time         read  ins  upd  del  new read %phy wrte %phy %dirt wait read      wrte      fsyc wrte fsyc wait time
02/22-10:06  3.8k    0    0    0    0 9.8k  259 15.0  0.1   0.0    0  351 6.6M  0.9 2.6K  0.9  0.7  0.8    0    0
02/22-10:07  5.6k    0    0    0    0  14k  344 21.8  0.1   0.0    0  461 8.7M  1.3 3.9K  1.3  1.0  1.1    0    0

read more

MySQL: Planet MySQL

An InnoDB tutorial

MySQL offers a variety of storage engines giving you a lot of flexibility in managing your storage and data access needs. Still I encounter customers who are not using this flexibility when they should, because they lack information about the advantages of Non-MyISAM storage engines or which are using storage engines like InnoDB as if they were using MyISAM.

This is the introductory article in a series of texts that will hopefully once become an InnoDB tutorial. A german version of this article is available in my german language blog.

An InnoDB Tutorial

The InnoDB storage engine is an engine that can be operated ACID compliant, does transactions and foreign key constraints. It is useful for all applications that do online transaction processing or have a high rate of concurrent write accesses for other reasons.


Continue reading "An InnoDB tutorial"

MySQL: Planet MySQL

Transactions - An InnoDB Tutorial

This is an english translation of the second part of an article in my german language blog.

Transactions - An InnoDB tutorial

InnoDB does transactions. Meaning: It collects statements working on InnoDB tables and applies them on COMMIT to all tables "at once". Either all of these statements inside one transaction succeed ("commit") or all of them fail ("rollback"), changing nothing.

By default, the database is in AUTOCOMMIT mode. Meaning: The server sees a virtual COMMIT command after each statement. You can disable autocommit completely, or you are starting an explicit transaction inside autocommit using the BEGIN statement.

Continue reading "Transactions - An InnoDB Tutorial"

MySQL: Planet MySQL

Configuring InnoDB - An InnoDB tutorial

This is the english translation of another article in my german language blog.

How are transactions organized physically

When InnoDB creates a new transaction it is not yet committed. The database has not yet made any promises to the application and so we do not really have to make anything persistent so far.

To be fast InnoDB tries to assemble the transaction in a memory buffer, the innodb_log_buffer. It should be sufficiently large that you actually can assemble such a transaction in memory without needing to write it out in part into the redo log. A size of 1M to 8M is normal.

Once a transaction is to be committed InnoDB has to read the page from disk which contains the image of the row that is being changed. It then has to actually make that change in memory. The changed page is cached in a memory pool called the innodb_buffer_pool. This pool also caches unchanged pages that have been accessed by read operations. All of these pages on disk and in memory are 16K in size and the innodb_buffer_pool_size determines how much RAM we will use as a cache for such pages - usually as much as we can spare.

Continue reading "Configuring InnoDB - An InnoDB tutorial"

MySQL: Planet MySQL