» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with performance + Ajax

Split the Initial Payload; Why are we sending all JavaScript down in one go

Steve Souders has another insightful post where he discusses splitting the initial payload for the JavaScript in your page / application.

Steve first outlines how JavaScript can affect how the browser renders a page:

The growing adoption of Ajax and DHTML means today’s web pages have more JavaScript than ever before. The average top ten U.S. web site[1] contains 252K of JavaScript. JavaScript slows pages down. When the browser starts downloading or executing JavaScript it won’t start any other parallel downloads. Also, anything below an external script is not rendered until the script is completely downloaded and executed. Even in the case where external scripts are cached the execution time can still slow down the user experience and thwart progressive rendering.

He then took the Alexa top ten websites and tracked how much of the code was executed before the onload event, based on functions called. The results are below:

Initial Payload Usage

Now, it is easy to understand why this is the case. There are factors such as the simplicity in putting the code in one file, and feeling like the cache effects will make the point moot (which Steve argues against). Steve gets this:

The task of finding where to split a large set of JavaScript code is not trivial. Doloto, a project from Microsoft Research, attempts to automate this work. Doloto is not publicly available, but the paper provides a good description of their system. (You can here the creators talk about Doloto at the upcoming Velocity conference.) The approach taken by Doloto uses stub functions that download additional JavaScript on demand. This might result in users having to wait when they trigger an action that requires additional functionality. Downloading the additional JavaScript immediately after the page has rendered might result in an even faster page.

Ajax: Ajaxian

CSS Child Selector Performance

Are child selectors slower than more simple brethren? This is a question that Jon Sykes sought out data for after he read the work of Jim Barraud.

His conclusion?

The skinny is that child selectors are a major performance issue.

This seemed to make sense, but to me I needed some sort of proof rather than just being told it's that way by someone, so over the last two days I've tried two approaches to see if I can replicate the issue.

The first one was rather a half-assed idea that afterwards seems fundamentally flawed as a benchmark.

So I took a new approach which does seem to return some valid and rather interesting findings, particularly regarding Safari and Firefox 3 and how they react to child selectors and performance.

The tests show that there is slow down using child selectors over direct class name declarations in IE6, IE7 and Safari 3. Safari 3 being the most impacted by child selectors. Firefox 2 has some impact, and Firefox 3 doesn’t seem to be impacted at all.

That said, this is a very extreme test, it is not often you’d have 20,000 class definitions in a single page or that all of them would use 4 levels of child selector.

CSS Child Selector Performance

Ajax: Ajaxian

Everything you wanted to know about String performance

Tom Trenka has detailed a great analysis of JavaScript performance across the various browsers. This is important work, and it reminded me of the JVM days where people tried to use pools and such... to find out that they were bad for performance as newer VM technology came out. When you try to be too tricky you can end up in a bad state as new versions try to optimize the common task, and not your trick.

Eugene Lazutkin had a great explanation on Strings in languages:

Many modern languages (especially functional languages) employ “the immutable object” paradigm, which solves a lot of problems including the memory conservation, the cache localization, addressing concurrency concerns, and so on. The idea is simple: if object is immutable, it can be represented by a reference (a pointer, a handle), and passing a reference is the same as passing an object by value — if object is immutable, its value cannot be changed => a pointer pointing to this object can be dereferenced producing the same original value. It means we can replicate pointers without replicating objects. And all of them would point to the same object. What do we do when we need to change the object? One popular solution is to use Copy-on-write (COW). Under COW principle we have a pointer to the object (a reference, a handle), we clone the object it points to, we change the value of the pointer so now it points to the cloned object, and proceed with our mutating algorithm. All other references are still the same.

JavaScript performs all of “the immutable object” things for strings, but it doesn’t do COW on strings because it uses even simpler trick: there is no way to modify a string. All strings are immutable by virtue of the language specification and the standard library it comes with. For example: str.replace(”a”, “b”) doesn’t modify str at all, it returns a new string with the replacement executed. Or not. If the pattern was not found I suspect it’ll return a pointer to the original string. Every “mutating” operation for strings actually returns a new object leaving the original string unchanged: replace, concat, slice, substr, split, and even exotic anchor, big, bold, and so on.

Tom then went on to detail the rounds that he went through:

  • Round 1: Measuring Native Operations.
  • Round 2: Comparing types of buffering techniques.
  • Round 3: Applying Results to dojox.string.Builder.

There are a few surprises here, and Tom later concludes:

  • Native string operations in all browsers have been optimized to the point where borrowing techniques from other languages (such as passing around a single buffer for use by many methods) is for the most part unneeded.
  • Array.join still seems to be the fastest method with Internet Explorer; either += or String.prototype.concat.apply(”", arguments) work best for all other browsers.
  • Firefox has definite issues with accessing argument members via dynamic/variables

Erik Arvidsson reminds us of the reason to use push(): IE6 and it’s really bad GC.

I look forward to the IE 8 / FF 3 results too.

Ajax: Ajaxian

Cuzillion Video and High Performance Book

Steve Souders has some more rules for us, as he announces a new book that he is working on. His preliminary view of the chapters are:

  1. Split the initial payload
  2. Load scripts without blocking
  3. Don’t scatter scripts
  4. Split dominant content domains
  5. Make static content cookie-free
  6. Reduce cookie weight
  7. Minify CSS
  8. Optimize images
  9. Use iframes sparingly
  10. To www or not to www

Steve has a call out to the community on thoughts for rules that you would like to see him cover:

The book should be out in early 2009. As I continue my research on web performance here at Google I’ll come up with another 5-10 rules to include. But I also wanted to ask you for suggested rules. What do you think is the performance killer for your web app? Better yet, what performance best practices have you discovered? For example, I think 3rd party rich media (Flash and JavaScript) ads are the long pole in the tent for many sites, and knowing the best way to embed widgets is growing more and more important.

I got to sit down with Steve to discuss the Cuzillion tool that we posted on last week. Steve talks about the project, and then walks us through a screencast showing how he found a problem with Orkut, and solved it.


Ajax: Ajaxian

Fast Streaming Ajax Proxy

Omar AL Zabir, the co-founder & CTO of Pageflakes has written about a continuous streaming Ajax proxy that solves the common problem that all Ajax proxies have, the double delay in downloading content on server first and then delivering to the browser.

Omar talks about the continuous proxy that can help solve the problems. The approach for the continuous proxy is:

  • Read bytes from external server in chunks of 8KB from a separate thread (Reader thread) so that it's not blocked
  • Store the chunks in an in-memory Queue
  • Write the chunks to ASP.NET Response from that same queue
  • If the queue is finished, wait until more bytes are downloaded by the reader thread

Making a difference

And if you wonder what difference this can make:

Content Proxy served about 42.3 million URLs last month which is quite an engineering challenge for us to make it both fast and scalable. Sometimes Content Proxy serves megabytes of data, which poses even greater engineering challenge. As such proxy gets large number of hits, if we can save on an average 100ms from each call, we can save 4.23 million seconds of download/upload/processing time every month. That's about 1175 man hours wasted throughout the world by millions of people staring at browser waiting for content to download.

There is even more detail on how the proxy has created.

Ajax: Ajaxian

Cuzillion: Performance best practices tool

Steve Souders has released a nice little tool called Cuzillion which has the tag line of ‘cuz there are zillion pages to check, although it could also be that there are a zillion ways to do Web development!

The tool lets you test out different techniques for optimizing performance in browsers, and these tests can be saved and shared by the community.

Steve explains how the tool came about:

I’m constantly thinking of or being asked about how browsers handle different sets of resources loaded in various ways. Before I would open an editor and build some test pages. Firing up a packet sniffer I would load these pages in different browsers to diagnose what was going on. I was starting my research on advanced techniques for loading scripts without blocking and realized the number of test pages needed to cover all the permutations was in the hundreds. That was the birth of Cuzillion.

Here Steve talks about some examples:

A great example of how Cuzillion is useful is looking at the impact inline scripts have when they follow a stylesheet in Internet Explorer. Typically, a stylesheet followed by any other resource results in both resources being downloaded in parallel in Internet Explorer. (In Firefox stylesheets block parallel downloads, so this performance optimization is only applicable to IE.) Here’s a Cuzillion page that shows this: stylesheet and image downloading in parallel. Both the stylesheet and image are configured to take 2 seconds to download. Since they download in parallel the page takes about 2 seconds to load as shown by the “page load time”.

But look what happens if we put an innocent inline script between the stylesheet and image: stylesheet, inline script, and image. Now, in Internet Explorer the stylesheet and image are downloaded sequentially, which means the page load time goes from 2 seconds to 4 seconds. If the inline script is simply moved above the stylesheet the two resources are downloaded in parallel again, and the page load goes back down to 2 seconds: inline script, stylesheet, and image.

This was a great discovery. But immediately my officemate asked if inline style blocks had the same effect. No problem. With Cuzillion I just do some clicks and drag-and-drop, and can test it out: stylesheet, inline style block, image. It turns out inline style blocks don’t cause stylesheets to block downloads.

The findings from a tool like Cuzillion are really valuable. The lessons learned from poking at inline scripts and stylesheets can save hundreds of milliseconds on page load times. And it’s a common problem. eBay, MSN.com, MySpace, and Wikipedia all suffer from this problem.

Much thanks to Google for letting me release this code under Open Source. It’s not currently on Google Code but if you want to contribute let me know and I’ll do that. Try it out and send me your feedback. And share your insights with others. We all want the Internet to be faster!

Steve is talking at Web 2.0 Expo today at 1:30pm in room 2002. If you are in town, check it out and see Cuzillion in action!

Ajax: Ajaxian

jQuery and Prototype Benchmarks

Piotr Solnica did a couple of posts on jQuery and Prototype benchmarks back in the day, and John-David Dalton just found them.

In part one, he runs tests such as:

JAVASCRIPT:
  1.  
  2. $('td.first').addClass('marked'); // jQuery
  3.  
  4. $$('td.first').each(function(cell){
  5.   cell.addClassName('marked');
  6. });
  7.  
  8. // or
  9.  
  10. $$('td.first').invoke('addClassName', 'marked');
  11.  

and concludes:

Executed tests show that Prototype seems to be faster then jQuery, with the exception of the new insertion method, which performance should be improved. Although I like jQuery syntax more then Prototype, the performance is way more important then saving few lines of code. Of course tests that I made don’t show how these libraries act in a real application, which is my task for the next part(s) of this article. Despite the results I must admit that I’m very excited about jQuery, my general impression is that this library is more mature then Prototype.

In part two, Piotr uses a custom JavaScript-based testing environment instead of running tests using Firebug profiler. This allows the test suite to run in many browsers, and this time concludes:

Prototype was at least 2 times faster then jQuery in 15 cases, and jQuery was faster then Prototype in 8 cases. What library should I choose? In my case I will stick with Prototype, because it offers the same functionality as jQuery does + more and it’s faster. jQuery is probably better for projects where there’s a need for some fancy UI effects and that’s it, but it’s just an assumption, correct me if I’m wrong…

Ajax: Ajaxian

using.js: manage JavaScript dependencies

Jon Davis created Using.js, a simple library to manage dependencies with the goals of:

  • Seperate script dependencies from HTML markup (let the script framework figure out the dependencies it needs, not the designer)
  • Make script referencing as simple and easy as possible (no need to manage the HTML files)
  • Lazy load the scripts and not load them until and unless they are actually needed at runtime.

To use the script you simply:

JAVASCRIPT:
  1.  
  2. // potential scripts are pre-registered first
  3. using.register("jquery", "/scripts/jquery-1.2.3.js");
  4.  
  5. // later, when actually needed
  6. using("jquery"); // loads jQuery and de-registers jQuery from using
  7. $("a").css("text-decoration", "none");
  8.  
  9. // or asynchronously
  10. using("jquery", function() {
  11.   $("a").css("text-decoration", "none");
  12. });
  13.  

As we see more and more tactics for getting performance by doing tricks with when scripts are loaded, I expect to see more of libraries like this. The key is working out exactly what script needs to be loaded right away, after the DOM is around, and what can wait for later. How do you want to load the script? Dynamic script element? Via iframe? XHR + eval? They all have pros and cons.

Ajax: Ajaxian

Dromaeo: JavaScript Engine Testing

Dromaeo

John Resig has put out Dromaeo maybe a touch before he wanted to due to people finding it :) The site hosts a subset of the WebKit Sun Spider JavaScript engine tests right now, with the desire to push on and do a lot more.

You can run tests and then compare your own results. What is particularly cool about all of this, is how we can harvest performance data. If enough people start running this bad boy from different parts of the world, and different devices (especially mobile devices) we will get a nice picture of performance of JavaScript engines.

The Mozilla Wiki has more information which covers the methodology, changes the other browsers would like to see, how to download and run locally, and a lot more.

Cool stuff. I hope to see this expand beyond JavaScript too and get even more "real world".

Ajax: Ajaxian

Using a hash property for security and caching

Hash Browns

Douglas Crockford would like to see a hash= attribute to aid security and performance:

Any HTML tag that accepts a src= or href= attribute should also be allowed to take a hash= attribute. The value of a hash attribute would be the base 32 encoding of the SHA of the object that would be retrieved. This does a couple of useful things.

First, it gives us confidence that the file that we receive is the one that we asked for, that it was not replaced or tampered with in transit.

Second, browsers can cache by hash code. If the cache contains a file that matches the requested hash=, then there is no need to go to the network regardless of the url. This would improve the performance of Ajax libraries because you would only have to download the library once for all of the sites you visit, even if every site links to its own copy.

Ajax: Ajaxian

Doloto: Code Splitting for Network-Bound Applications

I missed the Microsoft Research paper on Doloto: Code Splitting for Network-Bound Web 2.0 Applications:

Modern Web 2.0 applications, such as GMail, Live Maps, Facebook and many others, use a combination of Dynamic HTML, JavaScript and other Web browser technologies commonly referred as AJAX to push page generation and content manipulation to the client web browser. This approach improves the responsiveness of these network-bound applications, but the shift of application execution from a back-end server to the client also often dramatically increases the amount of code that must first be downloaded to the browser. This creates an unfortunate Catch-22: to create responsive distributed Web 2.0 applications developers move code to the client, but for an application to be responsive, the code must first be transferred there, which takes time. In this paper, we present DOLOTO, a system that analyzes application workloads and automatically performs code splitting of existing large Web 2.0 applications. After being processed by DOLOTO, an application will initially transfer only the portion of code necessary for application initialization. The rest of the application’s code is replaced by short stubs—their actual function code is transfered lazily in the background or, at the latest, on-demand on first execution. Since code download is interleaved with application execution, users can start interacting with the Web application much sooner, without waiting for the code that implements extra, unused features. To demonstrate the effectiveness of DOLOTO in practice, we have performed experiments on five large widely-used Web 2.0 applications. DOLOTO reduces the size of initial application code download by hundreds of kilobytes or as much as 50% of the original download size. The time to download and begin interacting with large applications is reduced by 20-40% depending on the application and wide-area network conditions.

They take examples with varying degrees of download-i-ness and show how their system can or can't help:

Doloto

As we play with the techniques for getting the best performance, it does feel like we need an abstraction that handles dependencies, loads only what is TRULY needed right away in onload, and loads other payloads as required / later.

Ajax: Ajaxian

The importance of bandwidth versus latency

Between cranking on Acid 3 tests, the WebKit crew has explained some issues in Optimizing Page Loading in the Web Browser:

It is well understood that page loading speed in a web browser is limited by the available connection bandwidth. However, it turns out bandwidth is not the only limiting factor and in many cases it is not even the most important one.

From the figure it is clear that while available bandwidth is a significant factor, so is the connection latency. Introducing just 50ms of additional latency doubled the page loading time in the high bandwidth case (from ~3200ms to ~6300ms).

Antti Koivisto goes on to explain why latency has such an impact, and how it is related to the browser having to figure out “all the associated resources” for a page, and the bane of document.write():

Problems start when a document contains references to external scripts. Any script can call document.write(). Parsing can’t proceed before the script is fully loaded and executed and any document.write() output has been inserted into the document text. Since parsing is not proceeding while the script is being loaded no further requests for other resources are made either. This quickly leads to a situation where the script is the only resource loading and connection parallelism does not get exploited at all. A series of script tags essentially loads serially, hugely amplifying the effect of latency.

The situation is made worse by scripts that load additional resources. Since those resources are not known before the script is executed it is critical to load scripts as quickly as possible. The worst case is a script that load more scripts (by using document.write() to write <script> tags), a common pattern in Javascript frameworks and ad scripts.

And now where WebKit comes into the picture..... they have put in some nice optimizations:

The latest WebKit nightlies contain some new optimizations to reduce the impact of network latency. When script loading halts the main parser, we start up a side parser that goes through the rest of the HTML source to find more resources to load. We also prioritize resources so that scripts and stylesheets load before images. The overall effect is that we are now able to load more resources in parallel with scripts, including other scripts.

Ajax: Ajaxian

Yahoo! releases new performance best practices

Stoyan Stefanov has been working with the Yahoo! engineers to find more best practices, and presented on a new batch:


He covers the existing 14 rules, plus 20 new rules for faster web pages. We’ve categorized the optimizations into: server, content, cookie, JavaScript, CSS, images, and mobile.

Here are the new items, with details on them coming soon:

1. Flush the buffer early [server]
2. Use GET for AJAX requests [server]
3. Post-load components [content]
4. Preload components [content]
5. Reduce the number of DOM elements [content]
6. Split components across domains [content]
7. Minimize the number of iframes [content]
8. No 404s [content]
9. Reduce cookie size [cookie]
10. Use cookie-free domains for components [cookie]
11. Minimize DOM access [javascript]
12. Develop smart event handlers [javascript]
13. Choose <link> over @import [css]
14. Avoid filters [css]
15. Optimize images [images]
16. Optimize CSS sprites [images]
17. Don't scale images in HTML [images]
18. Make favicon.ico small and cacheable [images]
19. Keep components under 25K [mobile]
20. Pack components into a multipart document [mobile]

Is it just me, or is performance getting a LOT of attention these days?

Ajax: Ajaxian

Roundup on Parallel Connections

Steve Souders has taken a step back, analyzed the blog content that came out of IE 8 supporting 6 connections per host, and has pulled together the facts to discuss:

HTTP/1.1 RFC

Section 8.1.4 of the HTTP/1.1 RFC says a “single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.” The key here is the word “should.” Web clients don’t have to follow this guideline. IE8 isn’t the first to exceed this guideline. Opera and Safari hold that honor supporting 4 connections per server.

Settings for current browsers

Steve documents the browsers, and discusses how you can tweak the settings in your own: "It’s possible to reconfigure your browser to use different limits.

Upperbound of open connections

This Max Connections test page contains 180 images split across 30 hostnames. That works out to 6 images per hostname. To determine the upperbound of open connections a browser supports I loaded this page and counted the number of simultaneous requests in a packet sniffer. Firefox 1.5 and 2.0 open a maximum of 24 connections (2 connections per hostname across 12 hostnames). This limit is imposed by Firefox’s network.http.max-connections setting which defaults to a value of 24.

In IE 6 , 7 & 8 I wasn’t able to determine the upperbound. At 2 connections per server, IE 6&7 opened 60 connections in parallel. At 6 connections per server, IE8 opened 180 connections in parallel.

Effect of proxies

Note that if you’re behind a proxy (at work, etc.) your download characteristics change. If web clients behind a proxy issued too many simulataneous requests an intelligent web server might interpret that as a DoS attack and block that IP address. Browser developers are aware of this issue and throttle back the number of open connections.

Will this break the Internet?

Much of the debate in the blog comments has been about how IE8’s increase in the number of connections per server might bring those web servers to their knees. I found the most insightful comments in Mozilla’s Bugzilla discussion about increasing Firefox’s number of connections. In comment #22 Boris Zbarsky lays out a good argument for why this increase will have no effect on most servers. But in comment #23 Mike Hommey points out that persistent connections are kept open for longer than the life of the page request. This last point scares me. As someone who has spent many hours configuring Apache to find the right number of child processes across banks of servers, I’m not sure what impact this will have.

Having said that, I’m please that IE8 has taken this step and I’d be even happier if Firefox followed suit. This change in the client will improve page load times from the user’s perspective. It does put the onus on backend developers to watch closely as IE8 adoption grows to see if it affects their capacity planning. But I’ve always believed that part of the responsibility and joy of being a developer is doing extra work on my side that can improve the experience for thousands or millions of users. This is another opportunity to do just that.

Ajax: Ajaxian

IE 8 Connetion Parallelism Issues

Now that we have our hands on IE 8 beta, we see developers testing the various features. Ryan Breen continues IE 8 tests on the new connection limits and parallelism:

A few weeks ago, I discussed IE8’s improved connection parallelism, specifically the increase from 2 concurrent connections per host to 6. One open question was the total number of connections allowed — my speculation was that the IE team would stick with a max of 6 rather than triple that value as well.

I was wrong. The new max is an astonishing 18 concurrent connections (editor: limited to 3 hostnames, you can get more concurrent connections with more hostnames).

This is actually slightly faster than the CNAME trick applied to previous IE versions as it does not incur any hostname resolution cost when establishing the first connections.

Sounds good! However, then Ryan discovered some strange results. When running tests against a dreamhost application there would be some downloads that would be very slow indeed, resulting in worse performance than before. Ryan has a hypothesis:

I suspect that my hosting provider (Dreamhost) simply can’t keep up with the dramatic increase in connection parallelism. 18 connections is simply too much of a good thing, and it will present a scaling problem for those who are on small to medium hosts. 10 users hitting at the same time will yield 180 concurrent connections, a pretty significant number for smaller providers.

Dial-up and cellular network users are also likely to be negatively impacted by this change. In the high broadband world where latency is the dominant factor, greater connection parallelism is a boon. But in bandwidth constrained networks, it just leads to thrash where progress is slowed by all the connections trying to share a small pipe.

I’m curious what sort of testing Microsoft has conducted to determine the impact of this change. The connection parallelism approach is used widely (including by the Virtual Earth team), and some servers may not be ready for the increase. My tests were conducted against only one host, but if similar results are experienced elsewhere, this may fall under the rubric of “don’t break the web.”

Ajax: Ajaxian

Lessons learned from improving Google Code web site performance

We went through Google Code and did a lot of work to get it running faster.

The team used a lot of the principles from Steve Souders book: High Performance Web Sites and ended up with a nice gain:

According to our latency measurement stats, the user-perceived latency on Google Code dropped quite a bit, anywhere between 30% and 70% depending on the page. This is a huge return for relatively small investments we've made along the way, and we hope you'll find these techniques useful for your own web development as well.

The changes were low hanging fruit that most of the sites could also implement:

  • Combined and minimized JavaScript and CSS files used throughout the site
  • Implemented CSS sprites for frequently-used images
  • Implemented lazy loading of Google AJAX APIs loader module (google.load)

Ajax: Ajaxian

Firefox 3 Memory Usage

Stuart Parmenter has been blogging about his work on memory usage and various malloc() libraries and their tradeoffs.

In his latest, he talks about the memory usage in Firefox 3 today and the work that he has done:

  • Reduced Memory fragmentation: One of the things we did to help was to minimize the number of total allocations we did, to avoid unnecessarily churning memory. We’ve managed to reduce allocations in almost all areas of our code base.
  • Fixed cycles with the Cycle collector: For Gecko 1.9, we’ve implemented an automated cycle collector that can recognize cycles in the in-memory object graph and break them automatically.
  • Tuned our caches: In many cases we’ve added expiration policies to our caches which give performance benefits in the most important cases, but don’t eat up memory forever. We now expire cached documents in the back/forward cache after 30 minutes since you likely won’t be going back to them anytime soon. We have timer based font caches as well as caches of computed text metrics that are very short lived.
  • Adjusted how we store image data: In Firefox 3, thanks to some work by Federico Mena-Quintero (of GNOME fame), we now throw away the uncompressed data after it hasn’t been used for a short while. Another fantastic change from Alfred Kayser changed the way we store animated GIFs so that they take up a lot less memory. We now store the animated frames as 8bit data along with a palette rather than storing them as 32 bits per pixel.

What about the tests?

For the results below we loaded 29 different web pages through 30 windows over 11 cycles (319 total page loads), always opening a new window for each page load (closing the oldest window alive once we hit 30 windows). At the end we close all the windows but one and let the browser sit for a few minutes so see if they will reclaim memory, clear short-term caches, etc. There is a 3 second delay between page loads to try and get all the browsers to take the same amount of time.

Great work guys, and thanks for talking to us about how you are doing this work!

Ajax: Ajaxian

IE 8 and Performance

Steve Souders has posted on IE 8 and performance improvements.

One new nugget of information that I haven't seen anywhere else is the fact that scripts are now loaded in parallel (and execution is still serial of course):

Increasing parallel downloads makes pages load faster. (For users with slower CPUs or Internet connections it could possibly be worse, but for most users it’s faster.) The HTTP 1.1 spec recommends that browsers only download two items in parallel per hostname, but the spec was written in 1999. Today’s clients and servers can support more parallel downloads, so IE8 has increased the number of downloads per hostname from 2 to 6.

Increasing parallel downloads makes pages load faster, which is why downloading external scripts (.js files) is so painful. Firefox and IE7 and earlier won’t start any parallel downloads while downloading an external script. These days, with the greater adoption of Web 2.0 and DHTML, many sites contain multiple scripts which means those pages will have long periods where all other downloads are blocked. It’s understandable that these scripts need to executed sequentially (code dependencies) but there’s no reason they couldn’t be downloaded in parallel. And that’s exactly what IE8 has done. It’s the first browser I’ve seen that has implemented this critical improvement for load times. Facebook has got to be thankful for this. They have 17 external scripts on their page. In most browsers this causes the page to load slowly for users coming in with an empty cache. But for users coming in using IE8 the scripts load ~80% faster because they’re loaded in parallel. In this screenshot showing HTTP requests for Facebook we see parallel script loading, and we also see them loading 6 at-a-time. Both of these IE8 enhancements dramatically speed up pages.

This dove tails nicely with the other items that we have already heard about:

  • 6 downloads per host
  • data: URIs, which means you embed your rounded corners

I wonder if IE 8 has a total maxconnections limit?

Ajax: Ajaxian

IE and WebKit Performance; Is WebKit the Ralph Nader of Browsers?

We have seen a barrage of performance and compliance information this week haven't we. Wow. We got a little more yesterday too.

The WebKit team talked about the Acid 3 test and how they are up to 90/100:

Support for CSS3 Selectors

We added support for all of the remaining CSS3 selectors. These include selectors like nth-child, nth-of-type, last-child, last-of-type, etc. These selectors were already implemented in KHTML, and the KHTML developers had even kindly provided patches for us in the relevant WebKit bugs. Therefore it was a simple matter of taking those patches, updating them to the WebKit codebase, and then merging them in. A big thanks to the KHTML developers for their hard work in this area.

Parsing Bugs

WebKit had a number of minor parsing bugs that Acid 3 targeted. The boxes did not render properly because of an obscure parsing bug that the test exploited (thanks, Hixie). In addition a number of other parsing bugs kept us from completely passing individual tests. We have updated our parser to be much closer to the HTML5-specified parsing rules.

WebKit has also never parsed DOCTYPEs before. I re-wrote WebKit’s DOCTYPE parsing to match the HTML5 specification, and so now if you put a DOCTYPE into your page it will be present in the DOM. In addition many bugs centered around proper mode resolution (quirks vs. strict) have now been fixed. You can document.write a DOCTYPE for example in a new document and have the correct mode be selected.

SVG

Acid3 has many SVG tests. We’ve been hard at work making these tests pass. In particular SVG font support and other aspects of the SVG DOM have been tested. Many of the remaining 10 points are SVG failures. We’ll be working on SVG animation in order to pass the last few SVG tests.

DOM

Acid3 tests a lot of DOM level 2 features, like traversal and ranges. It particularly focuses on the “liveness” of objects, e.g., making sure everything updates properly when you dynamically change a document by adding/removing nodes. Most of our failures in this area had to do with not behaving properly in the presence of these dynamic changes (even though we tended to pass the more static tests).

The JScript team also blogged about JScript improvements including fixing String concatenation.

Prior to this optimization of string concatenation, most developers used Array join operations to achieve the same result. We were aware of this tradeoff and made sure that the Array operations are also optimized. In either scenario, developers will experience significant performance gains.

I would love to see a benchmark of + vs. join() and hopefully see that join isn't needed anymore. This feels like some of the moments in Java where the verbose code that you wrote to make things faster started to back fire (e.g. String concat too, and object pooling as creating an object became so cheap).

Ralph Nader of Browsers

This is a very random thought. Watching the WebKit and Firefox teams grinding away makes me wonder if one of them is like the Ralph Nader of browsers. Does WebKit take away share from Firefox? I have seen many developers prefer it recently, and a lot use both (myself for one). Having said that, the bulk of users are probably the folk who buy a Mac and click on the browser icon and don't really care.

Is having the third candidate in the race a good thing? Does competition between WK and FF itself help both projects and spur them on to greater heights, or would it be even better to have a meeting of the minds and merge the WebKit and Gecko teams, at least in a way where they both aren't optimizing the JavaScript engine etc. Hmm.

NOTE: There is still a lot of room to innovate in the browser itself, but share the low level engines. Maybe it is most to do with personalities. If the teams could work together that would be one thing (and remember Dave H used to work on Firefox), but it not.... then it is obviously silly. What do you think?

Ajax: Ajaxian

How green is your Web site?

Steve Souders, the Web performance chap, has been inspired to calculate how green your website is based on the correlation between fast pages and energy:

Intrigued by an article on Radar about co2stats.com, I looked at my web performance best practices from the perspective of power consumption and CO2 emissions. YSlow grades web pages according to how well they follow these best practices. What if it could convert those grades into kilowatt-hours and pounds of CO2?

Let’s look at one performance rule on one site. Wikipedia is one of the top ten sites in the world (#9 according to Alexa). I love Wikipedia. I use it almost every day. Unfortunately, it has thirteen images in the front page that don’t have a far future Expires header (Rule 3). Every time someone revisits this page the browser has to make thirteen HTTP requests to the Wikipedia server to check if these images are still usable, even though these images haven’t changed in over seven months on average. A better way to handle this would be for Wikipedia to put a version number in the image’s URL and change the version number whenever the image changes. Doing this would allow them to tell the browser to cache the image for a year or more (using a far future Expires or Cache-Control header). Not only would this make the page load faster, it would also help the environment. Let’s try to estimate how much.

  • Let’s assume Wikipedia does 100 million page views/day. (I’ve seen estimates that are over 200 million/day.)
  • Assume 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page views/day.
  • Assume 10%, no, 5% of those are for the home page. We’re down to 4M page views/day for the home page with a primed cache. Each of those contains 13 HTTP requests to validate the images, for a total of 52M image validation requests/day.
  • Assume one web server can handle 100 of these requests/second, or 8.6M requests/day. that’s six web servers running full tilt year-round to handle this traffic.
  • Assume a fully loaded server uses 100W. Six servers, year-round, consume 5,000 kilowatt-hours per year or approximately 500-1000 pounds of CO2 emissions.

Wow, even more pressure for you to not be sloppy with your apps. Who knows if Green Peace will be using YSlow to find culprites and start boycotts because of it ;)

NOTE: Steve is co-chair of the O'Reilly Velocity conference which is taking place on June 23-24.

Ajax: Ajaxian

Backbase tests browser JavaScript and Render performance including IE 8

Sjoerd Mulder of Backbase ran a couple of performance tests on a slew of browsers, including IE 8 beta. He tested both the JavaScript performance, and the rendering performance:

A lot of respect and thanks to all the browser teams pushing the boundary of performance. I think it's an awesome result that the current nightlies of both Webkit and Gecko are 2x / 3x faster than the current production versions. Can't wait until Firefox 3 is released to the public!

It seems the IE team still needs to do a lot of optimization in the render engine, but still we have to consider that it's a beta 1. So let's hope they either improve some more or everybody switches to Firefox / Safari.

JavaScript Performance

The first section is the Javascript Engine of Backbase and testing various scenarios, for instance:

  • TDL Compiling / parsing
  • Javascript XPath 2.0 Engine
  • XEL execution / event propagation

Backbase Performance JavaScript

Render Performance (DOM)

The second part is rendering widgets, for example:

  • Rendering 1 accordion
  • Rendering 5 accordions
  • Rendering a listGrid
  • etc..

Backbase Performance Render

The full specification and timings is available to see.

Ajax: Ajaxian

Javascript Library Performance Test Roundup

This is a test system built by members of the PBwiki engineering team in order to gather data about browser and javascript library performance.

Dojo: del.icio.us tag dojo

Firefox 3 Performance Numbers

Moving on from the "let me use that API" conversation and only some real stuff, urandom (thanks for the comment) let us know about the Cybernet News article on Firefox 3 performance.

They are reporting that Firefox 3 is now faster than Safari 3, and is close to WebKit nightly in certain benchmarks. I can just picture Steve coming down on people saying "we market this as the fastest browser on the planet!" which is tough, as noone stays the fastest for ever. It is a race, and I am sure that WebKit and Firefox will be switching spots a lot in recently years.

Again, this is great news for developers. I am running WebKit and Firefox 3b3 and I am really happy with both. For some tasks I choose one over another (e.g. Firebug, Greasemonkey vs. lean and mean).

I’m sure what most of you care the most about are the facts, and so I’ve compiled the results of the SunSpider JavaScript Benchmark test for each of the different browsers. All of the tests below were performed on the same Windows machine, and the Firefox 3 nightly builds definitely came out on top. Here are the results sorted from best to worst (each one is hyperlinked to the full stats):

  1. Firefox 3 Nightly (PGO Optimized):