» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

Anonymous (change)
(change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase availability? The contents of availability page and all pages directly attached to availability will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
availability

availability

Tags Applied to availability

No one has tagged this page.

availability Wiki Pages

What is availability? Edit this page and describe it here.

sorted by: recent | see : popular
Content Tagged availability

Even Worse than It Appears?

Two things today.

First, thanks to Cote, Matt, Javier and others for their kind words. I am tremendously excited to be working with Hyperic. I’ve liked the company for a long time, and I’m even more impressed by the team and the strategy now that I am spending a couple of days a week here. I haven’t abandoned retirement altogether, but I have allowed it to erode a little bit because I like this opportunity so much.

Second, I want to pile on Javier’s post on the availability issues at Amazon over the past several days.

It’s worth pointing out that it had to be a pretty lousy weekend for the people responsible for running Amazon’s infrastructure. If you take a step back, the only reason that the downtime is remarkable is because it’s so rare. Nobody blinks when Twitter goes off-line. When my own favorite retailer since 1997 disappears, you notice it, because it simply never happens. I bet that things settle down quickly and we get back to the fast and reliable storefront that we’ve all come to expect.

Javier and others have touched on a likely cause of the outage: Complexity. As systems get more moving parts, they become harder to monitor and maintain. Many hope that the move to cloud computing will make things better; as you use infrastructure in the cloud, the thinking goes, you’ll be able to rely on the cloud service provider to keep it running.

As the downtime with Amazon’s storefront demonstrates, that’s a false hope. If you rely on computing services anywhere, you need to monitor them, and you need to understand how their availability affects your operations. IT shops are running more applications — JBoss, Tomcat, MySQL, home-grown software to run their businesses, along with the laundry list of proprietary and legacy applications they’ve installed over the years. These interact with one another. Every one of these software programs, and every connection among them, is a new potential source of failure.

We hardly ever abandon old systems and infrastructure. We only add new ones. Increased complexity is an irresistible force of nature, and managing it requires new techniques and new tools.

Back to the Amazon outage specifically: I’ve seen a couple of quotes in the media from people who have said, more or less, “Gee, whatever they changed that messed things up, they should have changed it during off hours.”

The fact of the matter is that there is no longer any such thing as “off hours.” For Amazon certainly, the storefront runs constantly. It may be nighttime in North America, but it’s daylight in Eastern Europe and Asia. More and more businesses — and especially those that deliver services over the Internet — simply never get to shut their computers down for maintenance. Their operations infrastructure has to take that into account.

More software running on more hardware in more places equals more complexity. At the same time, users all over the globe expect instantaneous access to data and services from anyplace, anytime. That combination means that IT professionals are staring at some pretty serious problems. The situation is even worse than it appears, though: For many businesses, as for Amazon, if the computers go down, the money stops flowing.

I’m glad to be at Hyperic because we’re working on the hard problems. Manageability of core infrastructure is the iceberg in front of most businesses, these days.

MySQL: Planet MySQL

Sun StorageTek Availability Suite at OpenSolaris.org

allows volumes and/or their snapshots, to be replicated between physically separated servers in real time, or by point-in-time, over virtually unlimited distances

zfs: del.icio.us/tag/zfs

Replication is dead, long live Replication!

Brian Aker has found general agreement with his post: "The Death of Read Replication".

Arjen Lentz says "I think Brian is right...", and Frank Mash confirmed: "what Brian says about replication, caching and memcached is very true".

Just like Video killed the Radio Star it looks like maybe Memcached killed the Replication Hierarchy!

But of course, Brian and others are talking about replication for scaling reads.

In my session on PBXT next week at the conference I will be talking about how we plan to use synchronous replication to produce an HA solution for MySQL at the engine level.

I will also discuss how some flexibility in the PBXT architecture makes it possible to actually scale writes efficiently as mentioned by Arjen in his blog.

So don't miss it:

Inside the PBXT Storage Engine
10:50am - 11:50am Thursday, 04/17/2008
Ballroom G

PrimeBase-XT: PBXT Blog

Page 1 | Next >>
Username:
Password:
(or Cancel)