Updated with Statement from Amazon: Amazon’s S3 cloud storage service went offline this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers offline.
It was no different this time around. I first learned about today’s outage when avatars and photos (stored on S3) used by Twinkle, a Twitter-client for iPhone, vanished.
My big hope was that it would come back soon, but popular S3 clients such as SmugMug were offline for more than eight hours — an awfully long time for Amazon’s Web Services division to bring back the service. As our sister blog, WebWorkerDaily, points out:
With two relatively serious outages in the space of 6 months, some will be asking the question of why depend on S3? The answer is simple: the rates are hard to beat, especially for service that doesn’t require any sysadmin budget.
That said, the outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazon’s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.
Update: Antonio Rodrigez, founder of Tabblo, now part of HP, on his blog asks the $64,000 pertinent question:
…if AWS is using Amazon.com’s excess capacity, why has S3 been down for most of the day, rendering most of the profile images and other assets of Web 2.0 tapestry completely inaccessible while at the same time I can’t manage to find even a single 404 on Amazon.com? Wouldn’t they be using the same infrastructure for their store that they sell to the rest of us?
Update #2: Building an offline redundancy for Amazon S3 could be big opportunity, Dave Winer says.
Update #3: A reader sent me an email and asked these two questions
Random Thought: The S3 outage points to a bigger (and a larger) issue: the cloud has many points of failure - routers crashing, cable getting accidentally cut, load balancers getting misconfigured, or simply bad code.
Update/Statement from Amazon in response to our questions:
As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to.
We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again.
These are sophisticated systems and it generally takes a while to get to root cause in such a situation—we will be providing our customers with more information when we’ve fully investigated the incident. We’re proud of our operational performance in operating S3 for almost 2.5 years, and our customers have generally been pleased with the reliability and performance of the service. But any downtime is unacceptable and we won’t be satisfied until it is perfect.
Amazon S3 is used heavily by a number of services behind Amazon’s retail websites. Those services were impacted, but the retail website did not show noticeable problems because it mostly uses cached data.

computing
infrastructure
amazon
cloud
aws
S3
Technology-News
opensource: del.icio.us tag/opensource
virtualization
opensource
cloud
aws
ec2
system:has:for
development-automation
In an ongoing effort to improve its suite of web services, Amazon said today that it’s adding persistent storage features to its EC2 storage service. Why is this important?
As the AWS blog explains, up until now you were able to attach 160 GB to 1.7 TB of storage to an EC2 “instance.” (An “instance” is essentially the server.) As long as the server was running, the storage remained available. Once you shut it down, the storage disappeared. “Applications with a need for persistent storage could store data in Amazon S3 or in Amazon SimpleDB, but they couldn’t readily access either one as if it was an actual file system,” the blog says.
Amazon CTO Werner Vogels, a keynote speaker at our Structure 08 conference, on his blog describes persistent storage this way: “It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.”
In other words, this new persistent storage essentially acts like an external hard drive “attached” to your “instance.” It can also be plugged into more than one “instance,” thus making it a shared drive. (I misreported the deleted bit. Error is regretted.) We are a little intrigued by how Amazon is making this happen. Some experts believe that it might be via using iSCSI. But persistent iSCSI at such large scale is expensive. (If anyone has a better explanation, please let me know.)
What it all means is that AWS/EC2 has gone up a few notches in terms of reliability. This reliability will go a long way towards the company offering service-level agreements to customers, especially large enterprises that want to utilize Amazon’s on-demand infrastructure. Alistair Croll earlier this month wrote a post in which he argued that Amazon was going after larger corporations, and today’s announcement bolsters his theory.

Building the Cloud Castle(tm), one brick at a time. Very similar set of operations to CouchDb, but without Couch’s views. Nice SimpleDB vs CouchDb side by side comparison. And more info from someone whose been playing with it longer.
Kellan-Elliot-Mcrea: Laughing Meme
Uncategorized
amazon
cloud
clouds
aws
Kellan-Elliot-Mcrea
aside