» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with memcached + MySQL

Memcached UDFs for MySQL version 0.6 Released

I've taken some time out of writing my book to get some coding down, and I'm pleased to announce the release of the Memcached Functions for MySQL, version 0.6. This release includes:

* Complete rewrite of error handling
* Build configuration fixes/improvements from Trond Nordby (Thanks!)
* Fixed memc_server_count
* More tests

You can find the release information-- repository and source at:

http://tangent.org/586/Memcached_Functions_for_MySQL.html

As well as

http://patg.net/downloads/memcached_functions_mysql-0.6.tar.gz

Have fun! More to come...

MySQL: Planet MySQL

Deploying Scalable Websites with Memcached


I spoke at the MySQL Conference and Expo this year about the architecture we have here at dealnews.com.  After my talk, Jimmy Guerrero of Sun/MySQL invited me to give a webinar on how dealnews uses memcached.  That is taking place next week, Thursday, October 09, 2008.  It is a free webinar.  We have used memcached in a variety of ways as we have grown. So, I will be talking about how dealnews used memcached in the past and present.

For more information, visit the MySQL web site.

      

MySQL: Planet MySQL

Hot technologies I care about - Sep 08

Don's got a great list of tech here

zfs: del.icio.us/tag/zfs

Developing Kenai - Agility on an OpenSource Enterprise Foundation

Last week Kenai went beta, with the usual services in a development hub site plus an additional "connected" angle. Our GF CORBA project is already using its Hg repository but another very interesting angle is the technology mix.

ALT DESCR

Kenai acheived development agility with reliability by using a combination of our scripting (JRuby/Rails) and enterprise (GlassFish v2, MySQL, OpenSolaris) technologies. These combinations are beginning to pop all over and are one of the key targets of GlassFish, using JRuby (see Nick's Blog site), Groovy (see Glenn's GroovyBlogs), or others.

Back to Kenai, check out Tim's Interview with Nick, and some Technical Details on Caching and in Testing/Performance Methodology. Also see Pictures from Austvik, Spotlight from Arun and Lenz's Technology Overview.

GlassFish: The Aquarium

Developing Kenai - Agility on an OpenSource Enterprise Foundation

Last week Kenai went beta, with the usual services in a development hub site plus an additional "connected" angle. Our GF CORBA project is already using its Hg repository but another very interesting angle is the technology mix.

ALT DESCR

Kenai acheived development agility with reliability by using a combination of our scripting (JRuby/Rails) and enterprise (GlassFish v2, MySQL, OpenSolaris) technologies. These combinations are beginning to pop all over and are one of the key targets of GlassFish, using JRuby (see Nick's Blog site), Groovy (see Glenn's GroovyBlogs), or others.

Back to Kenai, check out Tim's Interview with Nick, and some Technical Details on Caching and in Testing/Performance Methodology. Also see Pictures from Austvik, Spotlight from Arun and Lenz's Technology Overview.

MySQL: Planet MySQL

Should you cache?

Should you use memcached? Should you just shard mysql more?


Memcached's popularity is expanding its use into some odd places. It's becoming an authoritative datastore for some large sites, and almost more importantly it's sneaking into the lowly web startup. This is causing some discussion.

Most of whom seem to be missing the point. In this post I attempt to explain my point of view for how memcached should really influence your bouncing baby startups, and even give some pointers to the big guys who might have trouble seeing the forest through the trees.

Using memcached does not scale your website! Entertain me, I'm playing semantics here: This thing is not for scaleout. Mostly. What memcached really is, is a giant floating magnifying glass. It takes what you have already built and makes stretch ten times further. I insist on not confusing caching with scaleout as when your little stretch-armstrong of a website hits that tenfold limit, you're still screwed. There's no magic switch or configuration option in memcached that will save you from dealing with proper optimization and sharding.

You sure can get away with a hell of a lot though!

Keep it in the front of your mind; no it will not help you batch your writes, or make them smaller, or really help you deal with them in any useful way. If you want to write data you will need back later, you must shard. If it's data you don't care about, maybe write it to memcached and make a note of it in your business plan.

Also strongly keep in mind; memcached won't help your cache misses suck less. If you're writing awful data warehouse quality queries which you expect to run live on the site, go bust out the failboat and get-a-rowin'. You're screwed. As your dataset grows you will find new slices of hell in which your queries behave in all new ways. What once scanned "a few extra rows" now might hit tens of thousands. Cache misses will suck. You will have to deal with this. That's not something this solves.

Sometimes memcached does let you achieve the impossible, or scale the unlikely. Take slightly complex queries, or even template operations, which under the best of conditions might take 15-20 milliseconds each. An obnoxious join, a weird subquery, a tree walk, or fancy HTML templating. Being able to do this live could mean the difference betwen your website standing apart or having to settle with an awful workaround. In these cases, with a high enough hit rate, you can soak those cache misses and make the feature work.

My example isn't translating a 5 second query into 0.5ms with memcached, it's a 15-20ms query. If you had a dozen of these in a page load, a bad load might take an extra quarter second to render, but it wouldn't ruin the user experience. The issue memcached solves here is subtle. Tacking on 0.25 seconds per page render might not make the site completely unusable, but realize these queries are using solid resources on your expensive hardware for that extra quarter second. With a quadcore database, it's possible under the best conditions you would only be able to render 14-16 pages per second off of that machine. Throw in all the other things you have to do on a page load, writes, internal database whoosits and uneven CPU usage and you'd be lucky to get 5 pages per second.

In this case, it's still walking the line of scalability, but it turns something mildly impossible into something highly probable. On the cheap.

The cost equation


Now the most important factor here has reared its ugly head: Cost.

Cost. Ugly for startups. Ugly for established companies. Nightmares for venture capital. What is your cost? Why am I talking cash about companies who have millions of dollars in VC or sales? Just buy more servers! Whatever, right?

Well no. The largest cost is time. All others pale in comparison. The best physical goods investments your company can make are more related to your people than your hardware. Hardware has horrific depreciation. Most of the value is lost immediately, the rest over the first year of operation.

In comparison, buying your employees really fucking nice chairs, desks, and monitors in a swanky comfortable office are much more solid investments for your company. Aeron chairs have great resale value for that inevitable going-bumpkus dot bomb sale. Also anything you do to make your workers happier and more productive will pay out more than any hardware investment. Your product ships on time, you react to the market faster.

To sidestep into hardware a little... Always max out the RAM in your databases. Everyone should. I didn't realize people don't actually do this until I read some of these arguments against memcached. Whenever I add memcached to a website, RAM memcached gets is RAM that didn't fit into the databases, but easily fits into empty memory slots in webservers or cheaper hardware. A good solid database might cost $5,000, but a beefy memcached box will cost less than half that. Way less than that if you just add memory to existing hardware. So "adding that extra RAM to your databases" isn't a very fair apples-to-apples comparison unless you're already doing something wrong.

So it should be obvious just what the hell I'm getting at now, and what seems to be bothering everyone else about this whole stupid memcached fad.

You're all wasting your goddamn time! Yeesh!

How can a small site or startup benefit from memcached?


Simple: The idea.

Caching really wedges your whole RDBMS worldview. You don't just CRUD anymore. Your data is a process. A flow between points instead of just the store and display. At any time in this flow an idea may be injected. Maybe it's serializing a generated object and caching it, maybe it's utilizing gearman to shift off some asyncronous work. There is just more to it now.

But that's all messy complicated. What can you do? What should you do?

Design for having cache, design for change.
... but don't write all the code yet.
... but certainly design for change.

Think good object design. A "user" is a class. That user has base properties which you might find in the `user` table. A "user" object might have a profile, which is really another object with another class representing a `profile` table.

my $user; is an invaluable abstraction.

That user object must load and store data. When you build this at first it's all standard CRUD. Straight to a database.

Where would you think to add caching to this system? I hope I've made it too obvious.

At the query layer! Use a database abstraction class and have it memcache resultset objects and... No no no, that's a lie. I'm lying. Don't do that.

Do it inside that $user object. At the highest level possible. Take the whole object state and shovel it somewhere. That object is its own biggest authority. It knows when it's been updated, when it needs to load data, and when to write to the database. It might've had to read from several tables or load dependent objects based on what you ask it to do.

Instead of wrangling your best and brightest into figuring out a cache invalidation algorithm which might work "okay" against your schemas, do what's simple for the object. If adding caching to the $user object means the load() function tries memcached first, and all write operations hit memcached with a delete operation, so be it. You just added basic caching to one of the hottest objects in your website in, oh, half an hour. Maybe a few days if you're really scraping the bottom of the talent barrel.

Now we're back where we started. Reap the time benefits! Abstract your data access methods properly, plan for caching. Actually go write caching into a few objects. Maybe turn it off when you're done. You don't need it yet. Write your objects to talk directly to your database and save time.

Same idea for sharding. Either focus on that now, or realize you can take a $user object and extend its load() magic to find and write to users based on a sharding scheme. You probably don't have to rewrite all of the code to make this happen. Refactor to win.

So now you're ready. You're building your site fast and abstracting where you can. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs overplanning and wasting valuable time. Keeping it simple is gold here.

You're building something new and you're going to fail at it. Your design will be wrong, you will anticipate the wrong feature to be popular. Dealing with this quickly can set you apart. Being able to slap memcached into a bunch of objects in a few days (or even hours) can mean the difference between riding a load spike or riding the walrus.

Bullet points for fun! How can your small site benefit from memcached:

- Design for change! Holy crap I can't say this enough.
- Don't cache in ways that piss off your users.
- Not keeping it simple is fail.
- Cache and shard at the highest level possible relative to your data.
- Read High Performance MySQL 2nd ed. Memcached won't fix your lack of database knowledge.
- The same ideas which help you prepare for cache, helps you prepare for sharding.
- Don't waste all your time getting it right now. Get it close, get an idea, try it out, and prepare to be wrong.

Finally:

- Keep an open mind. Sites like grazr and fotolog do things differently. Doesn't mean they're right, doesn't mean they're wrong. Be inventive where it makes sense for your business.

There. Sorry this came out so long :)

MySQL: Planet MySQL

Cache your sessions. Don't piss off your users

I hope you're all enjoying the 1.2.6 stable release of memcached. Don't want to hear no whining about it crashing!

One of the most common questions in memcached land is the ever obnoxious "how do I put my sessions in memcached?". The long standing answer is usually "you don't", or "carefully", but people often walk the dark path instead. Many libraries do this as well, although I've seen at least one which gets it.

This isn't as huge of a deal as people make it out to be. I've been asked about this over the mailing list, in IRC, in person, and even in job interviews. What people end up doing gives me the willies! Why! Why why why... Well, I know why.

So what is the deal with sessions? Why does everyone want to jettison them from mysql/postgres/disk/whatever? Well, a session is:

- Almost always larger than 250 bytes, and almost always smaller than 5 kilobytes.
- Read from datastore for every logged in (and often logged out) user for every dynamic page load.
- Written to the datastore for every dynamic page load.
- Eventually reaped from the database after N minutes of inactivity.

Ok well that sucks I guess. Every time a user loads a page we read a blob row from mysql, then write a blob row back. This is a lot slower than row without blobs. Alright, so I see it now. Memcached to the rescue!

Er, except maybe it's a little complicated to actually memcached these things, since we need a write for every read... Why not just use memcached for sessions!? It lines up perfectly! Check it out:

- Set a memcached expire time for the max inactivity for a session. Say 30 minutes...
- Read from memcached.
- Write to memcached.
- A miss from memcached means the user is logged out.

Voila! ZERO reads or writes to the database, fantastic! Fast. Except I really don't like the tradeoffs here. This is one example where I believe the experience of both your users and your operations team is cheapened. Users now get logged out when anything goes wrong with memcached! Operations has to dance on eggshells. Or needles. Painful.

- Evictions are serious business. Even if you disable them (-M), out of memory errors means no one can log into your site.
- Upgrading memcached, OS kernel, hardware, etc, now means kicking everyone off your site.
- Adding/removing memcached servers kicks people off your site. Even with consistent hashing, while the miss rate is low it's not going to be zero.

So now what? Well we have zero accesses on our database, so it's fast! But we can't ever touch memcached again in fear of ticking off users. Progress be damned! Before you all think I'm completely off my rocker, I will admit there are some legitimate reasons to do this. If the way your site works doesn't really impact users on loss of a session, or impacts few enough users, you can use this design pattern. How many people are actually affected if you get logged out of wikipedia.org? Well, the people writing revisions certainly mind, but the greater userbase is unaffected. They're a non profit, they understand the tradeoff, etc. So that's fine. It's not fine for a lot of the people I see suggesting it or doing it. As developers get more comfy with memcached the session issue will become more of an obvious bottleneck.

The memcached/mysql hybrid really isn't that bad at all. You can get rid of over 90% of the database reads, a lot of the writes, and leave your users logged in during rolling upgrades of memcached.

First, recap the components involved: The page session handler itself, and some batch job which reaps dead sessions. For small websites (like a vbulletin forum) these batch jobs are often run during page loads. For larger sites they will be crons and so forth. This batch job can also be used to save data about sessions for later analysis.

The pattern is simple. For reads fetch from memcached first, database second. For writes write to memcached, unless you haven't synced the session to the database in the last N seconds. So if a user is clicking around they will only write to the database once every 120 seconds, and write to memcached every time.

Now modify the batch job. Crawl all expired sessions, and check memcached for the latest data. If session is not really expired don't expire it then, if it is use the latest possible data from memcached. Write back to the database. Easy.

You take the tradeoff of sessions being mildly lossy for recent information, but you gain reliability back in your system. Reads against the database should be almost nonexistent, and write load should drop significantly, but not as much as reads.

So please, if you run some website I might eventually use, don't put memcached in a place where restarting individual servers might piss me off. Thanks :)

I'd like to also challenge maintainers of session libraries for all languages to turn this design pattern into tunable (note all the places where I wrote N) libraries folks can plug in and use.

The more standard this stuff is the more likely the next fancy startup is going to get it right. Reuse is a great thing. I can't say enough about how great efforts like [info]krow's libmemcached go for standardizing how we use memcached, but it's also a great help to ship libraries for common design patterns.

MySQL: Planet MySQL

Memcached and MySQL Presentation

Here's my presentation on Memcached and MySQL:


You can download the sample files here:

MySQL: Planet MySQL

MinneBar 2008 This Weekend!

This Saturday, May 10th, is MinneBar, Minnesota's BarCamp. MinneBar is described as an "(un)Conference" which means it's a free, ad-hoc gathering of technology folks where everyone is encouraged to contribute.

MinneBar

There are a lot of great sessions this year. I'll be giving a presentation titled "Memcached & MySQL Sitting in a Tree." The talk is about the new Memcached Functions for MySQL. I'll talk a bit about the what, why, and how about this set of awesome UDFs.

I'm not sure what time I present and I think I have 50 minutes, but I don't know for sure. I'm trying something new this time around; I'll be publishing my presentation on SlideShare.

We are still 3 days away and there are currently 356 people signed up which is right around how many people were signed up last year. If you are in the Minneapolis/St. Paul area, you should come to participate and learn!

To register, visit their website, click the "login" link in the top right, use the password "c4mp" to login, then edit the main page, and add yourself to the bottom. Registration starts at 8:00am, so remember to set an alarm. :)

Hope to see you there!

MySQL: Planet MySQL

MySQL Conf08 - Hangin' with Brian Aker

Here is number 5 in my series of six podcasts from last week's MySQL conference and expo.

Just after lunch on Tuesday, I was able to corner Brian Aker, former CTO of MySQL, introduce myself and ask him if he was up for a podcast.  Without any convincing or arm twisting he happily agreed. :)

My interview with Brian (9:18)  Listen (Mp3)   Listen (ogg)


Brian's lenses adapt to match the art around him. 

Some of the topics we tackle:

  • Brian's parade of titles and where he's ended up within Sun
  • Amazon.com, durable memory, and what that means for EC2
  • Getting Memcached to run on larger systems
  • Getting access to larger hardware in general and being able to address scalability issues first hand
  • What OS Brian runs on his laptop
  • The coolest thing about MySQL
Pau for now...

MySQL: Planet MySQL

Using MySQL and Memcached on the GlassFish Application Server

Using MySQL and Memcached on the GlassFish Application Server

GlassFish: del.icio.us/tag/glassfish

Death of MySQL read replication highly exaggerated

I know I’m a little late to the discussion, but Brian Aker posted a thought-provoking piece on the imminent death of MySQL replication to scale reads.  His premise is that memcached is so cool and scales so much better, that read replication scaling is going to become a think of the past.  Other MySQL community people, like Arjen and Farhan, chimed in too.

Now, I love memcached.  We use it as a vital layer in our datacenters, and we couldn’t live without it.  But it’s not a total solution to all reads, so at least for our use case, it’s not going to kill our replica slaves that we use to scale reads.  

Why?  Because we still need to do index lookups to get the keys that we can extract from memcached.  And we have to do lots of those indexed queries.  Most of the row data lives inside of memcached, so this turns out to be a great solution, but we still need read slaves to provide the lists of keys.  Bottom line is that we still use read replication heavily - but we use it for different things that we did in years past.

And then, of course, there’s the issue of memcached failure.  For us, it’s very rare, and thanks to the way memcached works, it rarely hampers system performance, but when a node fails and needs to be re-filled, we have to go back to disk to get it.  And doing that efficiently means read slaves again.

For us, memcached plus MySQL replication is true magic.  Brian’s a very smart guy, and I realize he wrote the post to get people thinking and talking about the issue, but at least for us, read slaves are here to stay. )

MySQL: Planet MySQL

Replication will live!

Brian exposed some of his internal letters about death of replication (caused by memcached). Back when he wrote this, I responded back a bit too. Now as quite a few people really want to burry replication, let me point out some of reasoning why it will live.

First of all, both MySQL and memcached are slow (however you look at it, they’re both fast) - in proper gigabit environment both respond in a millisecond or so (well, MySQL is closed to 1.5ms). The major task becomes putting as much of work done in that round trip as possible.

Replication lag? The major problem with it was fixed by Google patches back in 4.0, finally hitting stock MySQL in 5.1. Now replication thread doesn’t get queued due to concurrency, and always enters the execution. Use binary log position serialization for reading users, and they will never notice replication lag.

More servers? Also more performance. Putting everything to memcached? Lots of stuff still has to be written to database. Once it is in database anyway, one can query it from database too. In low hitrate situations using memcached will be 3x slower, than just fetching data from database (get/get/set vs get). I’ve seen lots of code that was enthusiastic to use memcached, but authors didn’t actually try to profile what are the hit ratios.

Major problem with memcached is that it is a hash table. All it supports in data retrieval is asking for a key and getting a value. Which works great in situations where one just has a key and gets a value. Now if 50 keys are needed, memcached will need 50 lookups, quite often - routed to 50 different servers. Thats single database B-Tree read. How does one fetch all keys from 1 to 10000 with memcached? Thats right - ask for all of them. Of course, it is easy to resolve some of inefficiency by having separate memcached clusters for different tasks, appending information to multiple tracking objects, but thats where the ease of distribution starts fading, and development and administration needs surface.

memcached APIs now start supporting replication too - but flapping hosts can get environment out of sync quite fast then (host disappears, failover host starts getting traffic, host comes back with stale data…). Solution - object generation management, reading from multiple hosts, etc. - here again, solving simple problem already needs quite some complexity.

Add the ACID properties of databases, which quite often make whole development much easier - what ends up quite difficult to achieve in completely distributed ‘get/set’ environment.

And by the way - memcached can be outgunned. Hot objects can be cached directly on local application server stores, like APC object cache, file system, etc. New application servers nowadays have lots of memory.. :) Need global state? Just broadcast it to all.

There’re much more what replicated databases can provide - more complex views, all indexed and snappy, single line change doesn’t need invalidation of hundreds or thousands of objects around, and it all comes to interactivity and serving user’s needs better. Single line change immediately visible to all the users around.

Brian suggests using job queue systems and pushing everything to memcached - which makes it a dump of stuff instead of a cache. Putting more information that might be needed ends up with unnecessary evictions, which decrease efficiency of system too. Building those objects needs reading from database (or other persistent store), and eventually they end up in database too. Surprise - they can be served from database as well! :)

Anyway, memcaching ideas are moving forward, so does database replication. There is lots of room for replication to evolve yet - making it more async, parallel, relaxed. Whole MySQL protocol might be better - now it is all synchronous and boring.

Though replication has the storage overhead - more copies are usually saved - it also allows utilizing those copies in different way, different indexing schemes, though still maintaining same image of data on different nodes. Even better, such application-specific ‘roles’ of slaves can migrate from one node to other, heating up different segments of data.

The role of database replication will still remain core for scaling out reads for various workflows. Database allows incremental changes to infinite datasets required by various applications. Replication just multiples system capacity for presenting those datasets. Thats good. If it was up to me, I’d let it live.

P.S. Our current memcached cluster has 80 nodes, each providing 2gb of storage. When used properly, memcached is great tool too. :)

MySQL: Planet MySQL

Tangent Software: Memcached Functions for MySQL

This is a set of MySQL UDFs (user defined functions) to work with memcached using libmemcached. With these functions you get, set, append, prepend, delete, increment, decrement objects in memcached, as well as set which servers to use and which behavior t

opensource: del.icio.us tag/opensource

Pondering writing MySQL storage engine for MogileFS.

A MySQL network storage engine for MogileFS.

After writing the MySQL Storage Engine for Amazon S3, and having heavily dug into the storage engine for MemCacheD, it can't really be that hard.

Would it be useful?

MySQL: Planet MySQL

Sharedance

Sharedance is a high-performance server that centralize ephemeral key/data pairs on remote hosts, without the overhead and the complexity of an SQL database. designed to share caches and sessions between a pool of web servers

opensource: del.icio.us tag/opensource

Integration news x 2

Brian Aker starts work on a memcache engine for mysql. so your memcache cache acts just like a table.

the big thing here which I’ve seen asked for a couple of times on the memcached list is the ability to see a list of keys.

mysql > select * from foo1 WHERE k=”mine”;

freaking amazing.. I love these kind of mashups.

and the 2nd important event.

Django is starting a branch to integrate SQLAlchemy

MySQL: Planet MySQL

memcached performance

two interesting posts arrived on the memcached list which might be interesting to performance people.

The first was a comparison of The fastest lanugage binding on which ‘P’ language performed better. To make a note the PHP version actually uses libmemcache a ‘C’ library which goes a bit of the way to explain the wild disparity in speeds.

The 2nd more interesting one (to me) was the discussion of how Digg switched from using mysql to memcached with v3 of their new interface to handle storing sessions, due to a hardware crash on their mysql server.

others mentioned using InnoDB for this instead of MyISAM, with the biggest issue being clearing out expired sessions (which memcached does for you with less overhead), but storing the sessions in the database still suffered due to OS-contention.

of course with django you can choose either, to cache your stuff.. but the session handling is stored directly in the database .. looks like I have a weekend project ;-)

MySQL: Planet MySQL

Robot Co-op Server Software - Segment7

A run-down of the server software that powers 43 Things, et al.

awstats: del.icio.us tag/awstats