For many the "Cloud" in Cloud Computing signifies the notion of location independence; that somewhere in the internet services are provided and that to access them you do not need any specific knowledge of where they are located. Many applications have already been built using cloud services and they indeed achieve this location transparency; their customers do not have to worry about where and how the application is being served.
However for developers to do their job properly the cloud cannot be fully transparent. As much as we would like to make it easy and simple for everyone, building high-performance and highly reliable applications in the cloud requires that the developers have more control. For example a reality is that failures can happen; servers can crash and networks can become disconnected. Even if these are only temporary glitches and are transient errors, the developer of applications in the cloud really wants to make sure his or her application can continue to serve customers even in the face of these rare glitches. A similar issue is that of network latency; as much as we would like to see the cloud to be transparent, the transport of network packets is still limited to the speed of light (at best) and customers of cloud applications may experience a different performance depending on where they are located in relation to where the applications are running. We have seen that for many applications that works just fine, but there are developers who would like more control over how their customers are being served and for example would like to give all their customers low latency access, regardless of their location.
At Amazon we have been building applications on these cloud principles for several years now and we are very much aware of the tools that developers need to build applications that are required to meet very high standards with respect to scalability, reliability, performance and cost-effectiveness. We are also listening very closely to the feedback AWS customers are giving us to make sure we expose the right tools for them to do their job. We launched Amazon S3 in Europe to ensure that developers could build applications that could serve data out of a European storage cloud. We launched Regions and Availability Zones (combined with Elastic IPs) for Amazon EC2 such that developers would have better control over where their applications would be running to ensure high-availability. We are now ready to expand the cloud even further and bring the cloud storage to its customers' doorstep.
Today we are announcing that we are expanding the cloud by adding a new service that will give developers and businesses the ability to serve data to their customers world-wide, using low-latency and high data transfer rates. Using a global network of edge locations this new service can deliver popular data stored in Amazon S3 to customers around the globe through local access.
We have developed this content delivery service using the robust AWS principles we know work well for our customers:
This is an important first step in expanding the cloud to give developers even more control over how their applications and their data are served by the cloud. The service is currently in private beta but we expect to have the service widely available before the end of the year. You can get a few more details and sign up to get notified when the service is becoming on this AWS page Also check Jeff Bar's posting on the AWS weblog.
Today Amazon Web Services launched two new features in Amazon EC2 that are essential tools in building highly resilient applications: Elastic IP addresses and Availability Zones.
In summary:
With these two features EC2 customers now have tools to build applications that can tolerate a wide range of failure scenarios.
For more details visit the EC2 detail page and the Forum announcement
Update: excellent articles by the guys at RightScale: Using Elastic IP in switch-over scenarios, using Availability Zones to set up a fault-tolerant site and combining Elastic IP and Availability Zones.
Recently there has been a lot of discussion about the concept of eventual consistency in the context of data replication. In this positing I would like to try to collect some of the principles and abstractions related to large scale data replication and the trade-offs between high-availability and data consistency. I consider this work-in-progress as I don’t expect to get every definition crisp the first time.
There are two ways of looking at consistency. One is from the developer / client point of view; how they observe data updates. The second way is from the server side; how updates flow through the system and what guarantees systems can give with respect to updates.
Historical
In an ideal world there would only be one consistency model; when an update is made all observers will see that update. The first time this surfaced as difficult to achieve was in the database systems in the late seventies. The best “period piece” on this topic is by Bruce Lindsay, et al, “Notes on Distributed Databases”, Research Report RJ2571(33471), IBM Research, July 1979. The fundamental principles for database replication are laid out in these notes and number of techniques are discussed deal with achieve consistency. Many of these techniques try to achieve distribution transparency; that to the user of the system it appears as if there is only one system instead of a number of collaborating systems. Many of these systems took the approach that it was better to fail the system than to break this transparency.
In the mid-nineties, with the rise of larger internet systems, these practices were revisited. At that time one started to give consideration to the idea that maybe availability was the more important property of these systems. As a result people were struggling what it should be traded-off against. Eric Brewer, Berkeley systems professor and at that time head of Inktomi, brought the different trade-offs together in a keynote to the PODC conference in 2000. Eric presented the CAP theorem, which states that of three properties of shared-data systems; data consistency, system availability and tolerance to network partition one can only achieve two at any given time. A more formal confirmation can be found in a paper by Gilbert and Lynch.
A system that is not tolerant to network partitions can achieve data consistency and availability, and often does so by using transaction protocols. To make this work, client and storage systems are part of the same environment and they fail as a whole under certain scenarios and as such clients cannot observe partitions. An important observation is that in larger distributed scale systems, network partitions are a given and as such consistency and availability cannot be achieved at the same time. This means that one has two choices on what to drop; relaxing consistency will allow the system to remain highly available under the partitionable conditions and prioritizing consistency means that under certain conditions the system will not be available.
Both require the client developer to be aware of what the system is offering. If the system emphasizes consistency, the developer has to deal with the fact that system may not be available to take for example a write. If this write fails because of system unavailability the developer will have to deal with what to do with the data to be written. If the system emphasizes availability, it may always accept the write but under certain conditions a read will not reflect the result of a recently completed write. The developer then has to make a decision about whether the client requires access to the absolute latest update all the time. There is a range of applications that can handle slightly stale data and they are served well under this model.
In principle the consistency property of transaction systems as defined in the ACID properties is a different kind of consistency guarantee. In ACID consistency relates to the guarantee that when a transaction is finished the database is in a consistent state; for example when transferring money one from account to another the total amount held in both accounts should not change. In ACID based systems this kind of consistency is often the responsibility of the developer writing the transaction, but can be assisted by the database managing integrity constraints.
Continue to read "Eventually Consistent" at the All Things Distibuted website.
"For any truly scalable agile environment, self-organization is essential."
User:jeyrb: jey's network's del.icio.us bookmarks
Web
Database
design
systems
distributed
internet
adaptability
There is a lot of positive feedback about the Dynamo paper but I noticed that something I wrote in introducing the paper is being misunderstood. This was my fault, I wrote it too strongly.
What I meant by internal-only is that Dynamo is not directly exposed externally. However, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3. We are using these kinds of technologies to constantly improve the services we offer.
In two weeks we’ll present a paper on the Dynamo technology at SOSP, the prestigious biannual Operating Systems conference. Dynamo is internal technology developed at Amazon to address the need for an incrementally scalable, highly-available key-value storage system. The technology is designed to give its users the ability to trade-off cost, consistency, durability and performance, while maintaining high-availability.
Let me emphasize the internal technology part before it gets misunderstood: Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3.
We submitted the technology for publication in SOSP because many of the techniques used in Dynamo originate in the operating systems and distributed systems research of the past years; DHTs, consistent hashing, versioning, vector clocks, quorum, anti-entropy based recovery, etc. As far as I know Dynamo is the first production system to use the synthesis of all these techniques, and there are quite a few lessons learned from doing so. The paper is mainly about these lessons.
We are extremely fortunate that the paper was selected for publication in SOSP; only a very few true production systems have made it into the conference and as such it is a recognition of the quality of the work that went into building a real incrementally scalable storage system in which the most important properties can be appropriately configured.
Dynamo is representative of a lot of the work that we are doing at Amazon; we continuously develop cutting edge technologies using recent research, and in many cases do the research ourselves. Much of the engineering work at Amazon, whether it is in infrastructure, distributed systems, workflow, rendering, search, digital, similarities, supply chain, shipping or any of the other systems, is equally highly advanced.
The official reference for the paper is:
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
A pdf version is available at the full online version.
The text of the paper is copyright of the ACM and as such the following statement applies:
© ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SOSP’07, October 14–17, 2007, Stevenson, Washington, USA Copyright 2007 ACM 978-1-59593-591-5/07/0010
Dynamo: Amazon’s Highly Available Key-value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
Amazon.com
Abstract
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.
This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Categories and Subject Descriptors
D.4.2 [Operating Systems]: Storage Management; D.4.5
[Operating Systems]: Reliability; D.4.2 [Operating Systems]:
Performance;
General Terms
Algorithms, Management, Measurement, Performance, Design, Reliability.
Continue to read for the full online version.
In the recent years Mike Stonebraker has been advocating that the current commercial databases have become one-size-fits-all tools that are so general and heavy-weight that they do not excel in any particular area. Mike has written some papers on this topic of comparing general databases with specialized storage engines in areas of data-warehousing, text search, stream processing and scientific processing. In these papers he demonstrates that these specialized engines can run orders of magnitude faster than the commercial databases, using commodity hardware. I am not sure all the comparisons are fair, but he makes a compelling case. Of course Mike is a principal in Vertica which directly competes with what he calls “The Elephants”. Given that I have been close in the past to another example about how commercial goals can cloud academic judgment, am I rather cautious in evaluating these papers.
This week at VLDB Mike gave an invited talk on this topic in the Industrial Research track. His talk centered on that while he had previously proven that specialized approaches could run circles around the Elephants, he now could also demonstrate that OLTP, which is the bread-and-butter of the database industry, could be greatly outperformed by new architectures. In the paper and in his talk he put out a challenge to the research community, and to graduate students in particular, to take a particular interesting application area and build specialized solutions that provide at least a 50X improvement over the current products.
I like this challenge, given that 50X is likely to be able to make impact, where 2-4X in general can be easily compensated for by the next generation hardware. But something bugs me about the challenge and also about some of the demonstrations in the papers; 50X is still focused on scaling-up, just as many of the current database systems do, instead of scaling out, which is what the world really needs. The evidence in the paper is indeed about single box performance. This continuing N=1 thinking will never yield systems that can break through the current scalability limitations of enterprise software, regardless whether it runs 50 times faster or not.
If I could rewrite the challenge I would go for asking for “demonstrating performance at scale”; That one can achieve the rock solid performance and reliability guarantees by just incrementally increasing the components in the system, without any limitations. And every scaling axis needs to be satisfied; request rates, request complexity, data sets size, etc.
Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.
"For any truly scalable agile environment, self-organization is essential."
When I was building one of my first teams at Amazon, one that had to work on some really advanced distributed systems technology I put up a job description on this weblog. I was certainly pleased with the responses. Last year at a conference I heard from some of my former academic colleagues that they were using this description to educate their students abput where they were lacking in knowledge or experience. “Werner’s requirements” were used to explain to them that if they wanted to work on one of the really interesting distributed systems of this world, they better recognize the importance of [insert some random topic]
There are only a few engineers at Amazon who work directly for me and I currently have a job opening for such a position. It has most of the requirements from the previous description, but in this role there is more emphasis on analysis and modeling. If you are interested and you feel you qualify you can apply online at the Amazon careers site for job # 025213 or send email to the address on the right column of this page.
This is a job with some really tough requirements but it is an important job as you will have direct influence into how systems are build at Amazon and that is something we take very serious.
Senior Research Engineer in the Office of the CTO
Amazon.com's website is the most well-known front end to one of the world's largest and busiest service-oriented architectures. Its systems requirements are very challenging: maintain high-availability and guaranteed performance in an ultra-scalable fashion while being very cost efficient. From webpage rendering to order pipeline workflow, from data-warehouse to distributed caching, all require unique solutions. Many of these solutions require significant innovation: often these challenges have not even been addressed in research in a production setting at the scale of Amazon.
As an engineer working directly for the Chief Technology Officer, you will be confronted with Amazon.com's toughest problems. You need to be able to dive deep on technology issues, use your analytical skills to reduce a problem to its fundamentals, and create solutions. Important in this process is that you use computer science theory and knowledge of advanced research to design solutions that are fundamental in nature and can provide a solid basis for the Amazon.com platform on which to build.
The nature of the Amazon.com platform is that of a massive distributed system. As with any distributed system its overall scalability can often be reduced to the scalability of its state management systems. Much of your work will touch the way that data is transferred and stored on tens of thousands of servers through many datacenters through the world. You will need to have a thorough understanding of distributed storage systems, scalable database technologies and data stream processing.
In this position you work on investigating the fundamentals of the Amazon system architecture and use modeling to create insight into its reliability, durability, efficiency, performance and scalability. A particular emphasis is on the use of economic models to reason about the optimal use of resources and to build a proper foundation for service pricing.
This position requires you to have good communication skills as a significant portion of your time will be spend interacting with other Amazon engineers company-wide. You will need to be able to produce written materials and presentations that target engineers and projects managers as well as the senior executives. You will mentor engineers on fundamentals of distributed systems and computer science theory.
What specific things are we looking for in you?
You know your distributed systems theory: You know about logical time, snapshots, stability, message ordering, but also acid and multi-level transactions. You have heard about the FLP impossibility argument. You know why failure detectors can solve it (but you do not have to remember which one diamond-w was). You have at least once tried to understand Paxos by reading the original paper.
You have a good sense for distributed systems practice: you can reason about churn and locality in DHTs. You intuitively know when to apply ordered communication and when to use transactions. You can reason about data consistency in a system where hundreds of nodes are geographically distributed. You know why for example autonomy and symmetry are important properties for distributed systems design. You like the elegance of systems based on epidemic techniques.
You have good common sense about scale and availability: you frown when someone mentions two-phase commit in the same sentence as high-availability. You also realize that protocols that require a system "to be stable for a sufficiently long period of time" are not a good basis for building real systems. You understand the elegance of state-machine replication, but understand why it is hard to apply at large scale. You have a solid intuition about the impact of design decisions on the ability to achieve data consistency, and you are not scared by the idea of building systems based on 'eventually consistent' data.
You know about the advances in database technologies. You understand how database performance is optimized and how data-partitioning impacts query optimization. You realize what the limitations of commercial databases are and possess a good intuition about where solutions can be found. You are aware that column orientation and stream processing are not just research topics but actually solve hard problems.
Some of your heroes have actually built real systems: worshipping Dijkstra and Lamportis okay as long you also know why Jim Gray and Bruce Lindsay deserve a red carpet. You are not afraid to confront Felipe Cabrera or Marvin Theimer when you think they are wrong (never happens, of course :-)).
You have actually built some real systems yourself. At work or at school, you must have faced some real hard distributed problems and solved them. You may have been involved in an open-source project that has solid distributed systems components.
In summary:
For this position we require a PhD in computer science with expertise in distributed systems and you must have demonstrated expertise in modeling complex distributed systems. If you have an equivalent advanced degree with demonstrated expertise in this field through publications and/or completed products or projects you may also apply.
Update: in response to the question how flexible is the PhD requirement?: It is up to you to convince me it is not relevant in your case. That will be hard, but not impossible.
Carbonado is an extensible, high performance persistence abstraction layer for Java applications, providing a relational view to the underlying persistence technology. Persistence can be provided by a JDBC accessible SQL relational database, or it can be a BDB. It can also be fully replicated between the two.
Even if the backing database is not SQL based, Carbonado still supports many of the core features found in any kind of relational database. It supports queries, joins, indexes, and it performs query optimization. When used in this way, Carbonado is not merely a layer to a relational database, it is the relational database. SQL is not a requirement for implementing relational databases.
The Amazon engineers who collaborated on developing Carbonado over time received feedback that there was a lot of interest in the developer community outside of Amazon for this technology. We decided to release this software through an open source process for other developers to use, improve and extend.
Congratulation to Brian and his close collaborators for developing excellent technology and putting in the work to make it publicly available and to Don, Ryan and Stephanie for navigating all the legal and other obstacles to make this a reality.
Last week I gave a keynote at the ACM Principles of Distributed Computing Conference (PODC) on the topic of technology transfer. My choice of topic was triggered by recent presentations by a number of other research luminaries, who had remarked that the distributed computing research community had failed to make its mark; lots of good ideas, little impact.
I subscribe to a longer term point of view when it comes to the transformation of technology into successful products. Richard Gabriel created a model that lays out the time it takes and means by which innovations become successful consumable products[1]. It is certainly fits the market success of a number of the Xerox Parc innovations, the spreadsheet, or even the Web. To illustrate, hyperlinks and markup languages were developed in the mid sixties, the tcp/ip based networks came to life in the seventies, and it wasn’t until the mid nineties before the combination of these three turned into the basis for a mass consumer product. Gabriel’s presentation on “Models of Software Acceptance: How Winners Win” has more examples and a better connection to the “Crossing the Chasm [2]” style of thinking.
I believe this applies to much of the deep, fundamental, distributed systems material as well. Felipe Cabrera has said, for example, that when Vista ships next year with the support of fine grained transactions in programming languages, it will have been more than 20 years after the concepts where developed in the Quicksilver project at IBM.
But things are accelerating. Amazon.com and other places use high-quality, massively distributed systems to their advantage and require the use of recent research technology to exploit this massive scale. We see an increased adoption of advanced distributed systems beyond established techniques such as edge caching, lazy processing, fusions & aggregation, etc.
Adoption of more recent research technology for use in products is not a walk in the park. Engineers have to be very determined to overcome the many roadblocks that come with early adoption.
Unrealistic assumptions
Research is focused on the details of the technology itself, and not very focused on the application context of the technology. Often, to be able to make progress in research, you need to restrict the environment it can be applied to. For example, many academics will confess to have made the assumption that failures of component are not correlated. This absolutely unrealistic assumption will come back to haunt you in real life, where failures frequently are correlated, as they are often triggered by external or environmental events.
When selecting research technology, it is often a major exercise to discover what exact assumptions the researcher made. Then, the even more difficult exercize is figuring out whether you can live with those assumptions, whether the assumptions were relevant at all, or whether the may impede the adoption of the technology. And in the latter case, whether there is something we can do to bring the research to more realistic standards.
Uncertainty
Many of the insurmountable assumptions deal with reasoning away uncertainty. By turning life into a state machine in which no surprises can be found, one has the perfect world in which everything is clean and organized. There is a limit to how much you can trick life into being predictable and how much control you think you will have to keep life in-check. At small scale you may succeed, but when your systems grow in size and complexity you will lose control. As such, building scalable systems is all about letting go of control. (Turing’s Type I organizations)
In Control Theory, for a long time, researchers were convinced that practitioners did not want to use their research because there was too much complex math in it. It turned out however that the research was largely irrelevant in practice because it didn’t model a realistic world. The moment researchers started to produce work that explicitly took uncertainty into account, their work was rapidly adopted by engineers and architects. Ironically the math has only grown more complex…
In distributed systems we see a similar pattern arising; research which realistically models uncertainty is more readily useful for adoption. Randomization and self-organizing systems are crucial techniques for scaling systems in the real world.
A perfect world
The last topic I want to mention is the use of academic publications as a source of technology selection. Academics often battle out subtle competing views in their research papers. But if there are at least 10 competing approaches to implementing consensus in distributed systems, an engineer needs to make a judgment call on which approach would be best to solve his problems. If the academics can’t even make their mind up on what appears to be the right way, how can their customers be expected to do this for them.
Papers are often written in an extremely positive manner: “This is, once again, the best improvement to life since sliced bread”. There is hardly any self-criticism. There certainly are no details about the things that didn’t work, and why they didn’t work. And let’s not even start talking about the use of statistics in system research papers. Do we really only care about averages? That is, of course, assuming the experiments were realistic in the first place.
You need to re-execute a number of the most promising research achievements in realistic settings to help your selection. There is no way around it. Which means that these research achievements will only be considered if an engineer really needs these results because it will be very time consuming.
Occam’s razor
This is an occasion where we actually use Occam’s Razor in its original sense; if two approaches produce the same result, select the one with the fewest assumptions. We have seen frequently that this selection criteria will lead you to the technology that has the greatest likelihood of being adopted.
entia non sunt multiplicanda praeter necessitatem
[1] Gabriel, Richard, "Money through Innovation Reconsidered" in Patterns of Software: Tales from the Software Community, Oxford University Press, USA; Reprint edition (May 1, 1998) (download book pdf).
[2] Moore, Geoffrey A., “Crossing the Chasm: Marketing and Selling High Tech Products to Mainstream Customers” HarperBusiness; Rev edition (July 1999)
Cougaar or Cognitive Agent Architecture is a Java-based architecture for the construction of large-scale distributed agent-based applications.
It is a product of two consecutive, multi-year DARPA research programs into large-scale agent systems spanning eight years of effort. The first program conclusively demonstrated the feasibility of using advanced agent-based technology to conduct rapid, large scale, distributed logistics planning and replanning. The second program developed information technologies to enhance the survivability of these distributed agent-based systems operating in extremely chaotic environments. The resultant architecture, Cougaar, provides developers with a framework to implement large-scale distributed agent applications with minimal consideration for the underlying architecture and infrastructure.