I spoke at the MySQL Conference and Expo this year about the architecture we have here at dealnews.com. After my talk, Jimmy Guerrero of Sun/MySQL invited me to give a webinar on how dealnews uses memcached. That is taking place next week, Thursday, October 09, 2008. It is a free webinar. We have used memcached in a variety of ways as we have grown. So, I will be talking about how dealnews used memcached in the past and present.
For more information, visit the MySQL web site.

Last week Kenai went beta, with the usual services in a development hub site plus an additional "connected" angle. Our GF CORBA project is already using its Hg repository but another very interesting angle is the technology mix.
|
Kenai acheived development agility with reliability by using a combination of our scripting (JRuby/Rails) and enterprise (GlassFish v2, MySQL, OpenSolaris) technologies. These combinations are beginning to pop all over and are one of the key targets of GlassFish, using JRuby (see Nick's Blog site), Groovy (see Glenn's GroovyBlogs), or others. Back to Kenai, check out Tim's Interview with Nick, and some Technical Details on Caching and in Testing/Performance Methodology. Also see Pictures from Austvik, Spotlight from Arun and Lenz's Technology Overview. |
Last week Kenai went beta, with the usual services in a development hub site plus an additional "connected" angle. Our GF CORBA project is already using its Hg repository but another very interesting angle is the technology mix.
|
Kenai acheived development agility with reliability by using a combination of our scripting (JRuby/Rails) and enterprise (GlassFish v2, MySQL, OpenSolaris) technologies. These combinations are beginning to pop all over and are one of the key targets of GlassFish, using JRuby (see Nick's Blog site), Groovy (see Glenn's GroovyBlogs), or others. Back to Kenai, check out Tim's Interview with Nick, and some Technical Details on Caching and in Testing/Performance Methodology. Also see Pictures from Austvik, Spotlight from Arun and Lenz's Technology Overview. |
Here's my presentation on Memcached and MySQL:
You can download the sample files here:
This Saturday, May 10th, is MinneBar, Minnesota's BarCamp. MinneBar is described as an "(un)Conference" which means it's a free, ad-hoc gathering of technology folks where everyone is encouraged to contribute.
There are a lot of great sessions this year. I'll be giving a presentation titled "Memcached & MySQL Sitting in a Tree." The talk is about the new Memcached Functions for MySQL. I'll talk a bit about the what, why, and how about this set of awesome UDFs.
I'm not sure what time I present and I think I have 50 minutes, but I don't know for sure. I'm trying something new this time around; I'll be publishing my presentation on SlideShare.
We are still 3 days away and there are currently 356 people signed up which is right around how many people were signed up last year. If you are in the Minneapolis/St. Paul area, you should come to participate and learn!
To register, visit their website, click the "login" link in the top right, use the password "c4mp" to login, then edit the main page, and add yourself to the bottom. Registration starts at 8:00am, so remember to set an alarm. :)
Hope to see you there!
Here is number 5 in my series of six podcasts from last week's MySQL conference and expo.
Just after lunch on Tuesday, I was able to corner Brian Aker, former CTO of MySQL, introduce myself and ask him if he was up for a podcast. Without any convincing or arm twisting he happily agreed. :)
My interview with Brian (9:18) Listen (Mp3) Listen (ogg)

Brian's lenses adapt to match the art around him.
Some of the topics we tackle:
I know I’m a little late to the discussion, but Brian Aker posted a thought-provoking piece on the imminent death of MySQL replication to scale reads. His premise is that memcached is so cool and scales so much better, that read replication scaling is going to become a think of the past. Other MySQL community people, like Arjen and Farhan, chimed in too.
Now, I love memcached. We use it as a vital layer in our datacenters, and we couldn’t live without it. But it’s not a total solution to all reads, so at least for our use case, it’s not going to kill our replica slaves that we use to scale reads.
Why? Because we still need to do index lookups to get the keys that we can extract from memcached. And we have to do lots of those indexed queries. Most of the row data lives inside of memcached, so this turns out to be a great solution, but we still need read slaves to provide the lists of keys. Bottom line is that we still use read replication heavily - but we use it for different things that we did in years past.
And then, of course, there’s the issue of memcached failure. For us, it’s very rare, and thanks to the way memcached works, it rarely hampers system performance, but when a node fails and needs to be re-filled, we have to go back to disk to get it. And doing that efficiently means read slaves again.
For us, memcached plus MySQL replication is true magic. Brian’s a very smart guy, and I realize he wrote the post to get people thinking and talking about the issue, but at least for us, read slaves are here to stay. ![]()
Brian exposed some of his internal letters about death of replication (caused by memcached). Back when he wrote this, I responded back a bit too. Now as quite a few people really want to burry replication, let me point out some of reasoning why it will live.
First of all, both MySQL and memcached are slow (however you look at it, they’re both fast) - in proper gigabit environment both respond in a millisecond or so (well, MySQL is closed to 1.5ms). The major task becomes putting as much of work done in that round trip as possible.
Replication lag? The major problem with it was fixed by Google patches back in 4.0, finally hitting stock MySQL in 5.1. Now replication thread doesn’t get queued due to concurrency, and always enters the execution. Use binary log position serialization for reading users, and they will never notice replication lag.
More servers? Also more performance. Putting everything to memcached? Lots of stuff still has to be written to database. Once it is in database anyway, one can query it from database too. In low hitrate situations using memcached will be 3x slower, than just fetching data from database (get/get/set vs get). I’ve seen lots of code that was enthusiastic to use memcached, but authors didn’t actually try to profile what are the hit ratios.
Major problem with memcached is that it is a hash table. All it supports in data retrieval is asking for a key and getting a value. Which works great in situations where one just has a key and gets a value. Now if 50 keys are needed, memcached will need 50 lookups, quite often - routed to 50 different servers. Thats single database B-Tree read. How does one fetch all keys from 1 to 10000 with memcached? Thats right - ask for all of them. Of course, it is easy to resolve some of inefficiency by having separate memcached clusters for different tasks, appending information to multiple tracking objects, but thats where the ease of distribution starts fading, and development and administration needs surface.
memcached APIs now start supporting replication too - but flapping hosts can get environment out of sync quite fast then (host disappears, failover host starts getting traffic, host comes back with stale data…). Solution - object generation management, reading from multiple hosts, etc. - here again, solving simple problem already needs quite some complexity.
Add the ACID properties of databases, which quite often make whole development much easier - what ends up quite difficult to achieve in completely distributed ‘get/set’ environment.
And by the way - memcached can be outgunned. Hot objects can be cached directly on local application server stores, like APC object cache, file system, etc. New application servers nowadays have lots of memory.. :) Need global state? Just broadcast it to all.
There’re much more what replicated databases can provide - more complex views, all indexed and snappy, single line change doesn’t need invalidation of hundreds or thousands of objects around, and it all comes to interactivity and serving user’s needs better. Single line change immediately visible to all the users around.
Brian suggests using job queue systems and pushing everything to memcached - which makes it a dump of stuff instead of a cache. Putting more information that might be needed ends up with unnecessary evictions, which decrease efficiency of system too. Building those objects needs reading from database (or other persistent store), and eventually they end up in database too. Surprise - they can be served from database as well! :)
Anyway, memcaching ideas are moving forward, so does database replication. There is lots of room for replication to evolve yet - making it more async, parallel, relaxed. Whole MySQL protocol might be better - now it is all synchronous and boring.
Though replication has the storage overhead - more copies are usually saved - it also allows utilizing those copies in different way, different indexing schemes, though still maintaining same image of data on different nodes. Even better, such application-specific ‘roles’ of slaves can migrate from one node to other, heating up different segments of data.
The role of database replication will still remain core for scaling out reads for various workflows. Database allows incremental changes to infinite datasets required by various applications. Replication just multiples system capacity for presenting those datasets. Thats good. If it was up to me, I’d let it live.
P.S. Our current memcached cluster has 80 nodes, each providing 2gb of storage. When used properly, memcached is great tool too. :)
opensource: del.icio.us tag/opensource
Database
linux
MySQL
perl
infrastructure
memcached
scalability
Brian Aker starts work on a memcache engine for mysql. so your memcache cache acts just like a table.
the big thing here which I’ve seen asked for a couple of times on the memcached list is the ability to see a list of keys.
mysql > select * from foo1 WHERE k=”mine”;
freaking amazing.. I love these kind of mashups.
and the 2nd important event.
Django is starting a branch to integrate SQLAlchemy
two interesting posts arrived on the memcached list which might be interesting to performance people.
The first was a comparison of The fastest lanugage binding on which ‘P’ language performed better. To make a note the PHP version actually uses libmemcache a ‘C’ library which goes a bit of the way to explain the wild disparity in speeds.
The 2nd more interesting one (to me) was the discussion of how Digg switched from using mysql to memcached with v3 of their new interface to handle storing sessions, due to a hardware crash on their mysql server.
others mentioned using InnoDB for this instead of MyISAM, with the biggest issue being clearing out expired sessions (which memcached does for you with less overhead), but storing the sessions in the database still suffered due to OS-contention.
of course with django you can choose either, to cache your stuff.. but the session handling is stored directly in the database .. looks like I have a weekend project ;-)
awstats: del.icio.us tag/awstats
Apache
webrick
awstats
opensource
memcached
rubyonrails
subversion
FreeBSD
MySQL
rmagick