Steve Souders has another insightful post where he discusses splitting the initial payload for the JavaScript in your page / application.
Steve first outlines how JavaScript can affect how the browser renders a page:
The growing adoption of Ajax and DHTML means today’s web pages have more JavaScript than ever before. The average top ten U.S. web site[1] contains 252K of JavaScript. JavaScript slows pages down. When the browser starts downloading or executing JavaScript it won’t start any other parallel downloads. Also, anything below an external script is not rendered until the script is completely downloaded and executed. Even in the case where external scripts are cached the execution time can still slow down the user experience and thwart progressive rendering.
He then took the Alexa top ten websites and tracked how much of the code was executed before the onload event, based on functions called. The results are below:

Now, it is easy to understand why this is the case. There are factors such as the simplicity in putting the code in one file, and feeling like the cache effects will make the point moot (which Steve argues against). Steve gets this:
The task of finding where to split a large set of JavaScript code is not trivial. Doloto, a project from Microsoft Research, attempts to automate this work. Doloto is not publicly available, but the paper provides a good description of their system. (You can here the creators talk about Doloto at the upcoming Velocity conference.) The approach taken by Doloto uses stub functions that download additional JavaScript on demand. This might result in users having to wait when they trigger an action that requires additional functionality. Downloading the additional JavaScript immediately after the page has rendered might result in an even faster page.
Tom Trenka has detailed a great analysis of JavaScript performance across the various browsers. This is important work, and it reminded me of the JVM days where people tried to use pools and such... to find out that they were bad for performance as newer VM technology came out. When you try to be too tricky you can end up in a bad state as new versions try to optimize the common task, and not your trick.
Eugene Lazutkin had a great explanation on Strings in languages:
Many modern languages (especially functional languages) employ “the immutable object” paradigm, which solves a lot of problems including the memory conservation, the cache localization, addressing concurrency concerns, and so on. The idea is simple: if object is immutable, it can be represented by a reference (a pointer, a handle), and passing a reference is the same as passing an object by value — if object is immutable, its value cannot be changed => a pointer pointing to this object can be dereferenced producing the same original value. It means we can replicate pointers without replicating objects. And all of them would point to the same object. What do we do when we need to change the object? One popular solution is to use Copy-on-write (COW). Under COW principle we have a pointer to the object (a reference, a handle), we clone the object it points to, we change the value of the pointer so now it points to the cloned object, and proceed with our mutating algorithm. All other references are still the same.
JavaScript performs all of “the immutable object” things for strings, but it doesn’t do COW on strings because it uses even simpler trick: there is no way to modify a string. All strings are immutable by virtue of the language specification and the standard library it comes with. For example: str.replace(”a”, “b”) doesn’t modify str at all, it returns a new string with the replacement executed. Or not. If the pattern was not found I suspect it’ll return a pointer to the original string. Every “mutating” operation for strings actually returns a new object leaving the original string unchanged: replace, concat, slice, substr, split, and even exotic anchor, big, bold, and so on.
Tom then went on to detail the rounds that he went through:
There are a few surprises here, and Tom later concludes:
Erik Arvidsson reminds us of the reason to use push(): IE6 and it’s really bad GC.
I look forward to the IE 8 / FF 3 results too.
|
NetBeans 6.1 was just released and it is both a feature and a performance release. The feature part has to do with JavaScript support - the language everybody loves to hate (See Roberto's talk at JavaOne), MySQL improved support, Spring Framework support, Hibernate support, Axis 2, Sailfin support, and Jersey (RESTful Web Services) support. |
It's also bringing back features lost in the translation from 5.5. to 6.0 such as JavaBeans support and the JSF CRUD Generator. NetBeans also now provides a more natural way to share libraries. All in all a lot of web and server-side features including support for the latest GlassFish v2ur2 release. The full list of features is here.
Performance is related to startup-time, completion speed, and memory consumption. Coming attractions include PHP support, JavaScript debugger, Groovy/Grails support, and more.
As always, the nice download matrix is available here. Congratulations to the team for yet another solid release!
Steve Souders has released a nice little tool called Cuzillion which has the tag line of ‘cuz there are zillion pages to check, although it could also be that there are a zillion ways to do Web development!
The tool lets you test out different techniques for optimizing performance in browsers, and these tests can be saved and shared by the community.
Steve explains how the tool came about:
I’m constantly thinking of or being asked about how browsers handle different sets of resources loaded in various ways. Before I would open an editor and build some test pages. Firing up a packet sniffer I would load these pages in different browsers to diagnose what was going on. I was starting my research on advanced techniques for loading scripts without blocking and realized the number of test pages needed to cover all the permutations was in the hundreds. That was the birth of Cuzillion.
Here Steve talks about some examples:
A great example of how Cuzillion is useful is looking at the impact inline scripts have when they follow a stylesheet in Internet Explorer. Typically, a stylesheet followed by any other resource results in both resources being downloaded in parallel in Internet Explorer. (In Firefox stylesheets block parallel downloads, so this performance optimization is only applicable to IE.) Here’s a Cuzillion page that shows this: stylesheet and image downloading in parallel. Both the stylesheet and image are configured to take 2 seconds to download. Since they download in parallel the page takes about 2 seconds to load as shown by the “page load time”.
But look what happens if we put an innocent inline script between the stylesheet and image: stylesheet, inline script, and image. Now, in Internet Explorer the stylesheet and image are downloaded sequentially, which means the page load time goes from 2 seconds to 4 seconds. If the inline script is simply moved above the stylesheet the two resources are downloaded in parallel again, and the page load goes back down to 2 seconds: inline script, stylesheet, and image.
This was a great discovery. But immediately my officemate asked if inline style blocks had the same effect. No problem. With Cuzillion I just do some clicks and drag-and-drop, and can test it out: stylesheet, inline style block, image. It turns out inline style blocks don’t cause stylesheets to block downloads.
The findings from a tool like Cuzillion are really valuable. The lessons learned from poking at inline scripts and stylesheets can save hundreds of milliseconds on page load times. And it’s a common problem. eBay, MSN.com, MySpace, and Wikipedia all suffer from this problem.
Much thanks to Google for letting me release this code under Open Source. It’s not currently on Google Code but if you want to contribute let me know and I’ll do that. Try it out and send me your feedback. And share your insights with others. We all want the Internet to be faster!
Steve is talking at Web 2.0 Expo today at 1:30pm in room 2002. If you are in town, check it out and see Cuzillion in action!
Piotr Solnica did a couple of posts on jQuery and Prototype benchmarks back in the day, and John-David Dalton just found them.
In part one, he runs tests such as:
and concludes:
Executed tests show that Prototype seems to be faster then jQuery, with the exception of the new insertion method, which performance should be improved. Although I like jQuery syntax more then Prototype, the performance is way more important then saving few lines of code. Of course tests that I made don’t show how these libraries act in a real application, which is my task for the next part(s) of this article. Despite the results I must admit that I’m very excited about jQuery, my general impression is that this library is more mature then Prototype.
In part two, Piotr uses a custom JavaScript-based testing environment instead of running tests using Firebug profiler. This allows the test suite to run in many browsers, and this time concludes:
Prototype was at least 2 times faster then jQuery in 15 cases, and jQuery was faster then Prototype in 8 cases. What library should I choose? In my case I will stick with Prototype, because it offers the same functionality as jQuery does + more and it’s faster. jQuery is probably better for projects where there’s a need for some fancy UI effects and that’s it, but it’s just an assumption, correct me if I’m wrong…
JavaScript
compression
dojo
performance
tools
deployment
ysi
Jon Davis created Using.js, a simple library to manage dependencies with the goals of:
To use the script you simply:
As we see more and more tactics for getting performance by doing tricks with when scripts are loaded, I expect to see more of libraries like this. The key is working out exactly what script needs to be loaded right away, after the DOM is around, and what can wait for later. How do you want to load the script? Dynamic script element? Via iframe? XHR + eval? They all have pros and cons.
John Resig has put out Dromaeo maybe a touch before he wanted to due to people finding it :) The site hosts a subset of the WebKit Sun Spider JavaScript engine tests right now, with the desire to push on and do a lot more.
You can run tests and then compare your own results. What is particularly cool about all of this, is how we can harvest performance data. If enough people start running this bad boy from different parts of the world, and different devices (especially mobile devices) we will get a nice picture of performance of JavaScript engines.
The Mozilla Wiki has more information which covers the methodology, changes the other browsers would like to see, how to download and run locally, and a lot more.
Cool stuff. I hope to see this expand beyond JavaScript too and get even more "real world".
Sjoerd Mulder of Backbase ran a couple of performance tests on a slew of browsers, including IE 8 beta. He tested both the JavaScript performance, and the rendering performance:
A lot of respect and thanks to all the browser teams pushing the boundary of performance. I think it's an awesome result that the current nightlies of both Webkit and Gecko are 2x / 3x faster than the current production versions. Can't wait until Firefox 3 is released to the public!
It seems the IE team still needs to do a lot of optimization in the render engine, but still we have to consider that it's a beta 1. So let's hope they either improve some more or everybody switches to Firefox / Safari.
JavaScript Performance
The first section is the Javascript Engine of Backbase and testing various scenarios, for instance:

Render Performance (DOM)
The second part is rendering widgets, for example:

The full specification and timings is available to see.

Moving on from the "let me use that API" conversation and only some real stuff, urandom (thanks for the comment) let us know about the Cybernet News article on Firefox 3 performance.
They are reporting that Firefox 3 is now faster than Safari 3, and is close to WebKit nightly in certain benchmarks. I can just picture Steve coming down on people saying "we market this as the fastest browser on the planet!" which is tough, as noone stays the fastest for ever. It is a race, and I am sure that WebKit and Firefox will be switching spots a lot in recently years.
Again, this is great news for developers. I am running WebKit and Firefox 3b3 and I am really happy with both. For some tasks I choose one over another (e.g. Firebug, Greasemonkey vs. lean and mean).
I’m sure what most of you care the most about are the facts, and so I’ve compiled the results of the SunSpider JavaScript Benchmark test for each of the different browsers. All of the tests below were performed on the same Windows machine, and the Firefox 3 nightly builds definitely came out on top. Here are the results sorted from best to worst (each one is hyperlinked to the full stats):
- Firefox 3 Nightly (PGO Optimized): 7263.8ms
- Firefox 3 Nightly (02/25/2008 build): 8219.4ms
- Opera 9.5.9807 Beta: 10824.0ms
- Firefox 3 Beta 3: 16080.6ms
- Safari 3.0.4 Beta: 18012.6ms
- Firefox 2.0.0.12: 29376.4ms
- Internet Explorer 7: 72375.0ms
It’s important to know that every time you run the SunSpider Benchmark it conducts each test five times, and the result is the average of the five tests. So it is a rather thorough test, and definitely shows off the speed improvements that Firefox 3 is going to be bringing to the table.
Brendan has said that they are not finished with their performance work for Firefox 3, and I am sure the WebKit team isn't sitting on their hands.... oh and what about IE 8? It will be fun when that is in the wild to be tested.
Ben Lisbakken, who sits 100 feet away from me, developed gearsAJAXHelper, a library that bridges the AJAX search and feed APIs with Gears, to get a speed improvement:
We decided it would be cool to write a small library to make it easy for you AJAX APIs developers to write quick-loading, always fresh searches/feeds. The gearsAJAXHelper has two main features - it allows you to store and return key/value pairs from the local database, and it allows you to choose whether you want all resources files on the page (images, CSS, Javascript, HTML) to automatically be cached in the Gears cache.
The key/value pair database feature let's you store the query/results as a key/value pair. Then, the next time the query is made, the results can be served from the database while fresh results are being retrieved. This dramatically reduces the latency in queries/feed grabs.
The (optional) automatic cacheing of resource files will make it so that each time the user visits your webpage they will be getting resources served from their Google Gears cache, not new versions from the internet. Be careful when using this feature, as you might not want stale content to be served. There is also a refresh function, to clear the Google Gears cache of old files.
You can take a look at the sample, that saves the content for presidential candidates so if you click back on an area, you get instant loads.
The bulk of the API is:
This example shows how you can optional add super-caching to your applications with technology such as Gears. It 'aint just about offline!
NOTE: JavaScript Hackathon
If you are in the bay area, and like JavaScript (and if not, why are you on this blog!), then join us at Google this Friday for a Google Developer Hackathon focusing on JavaScript.
There will be two sessions -- one from 2:00PM - 5:30PM and another from 6:00PM - 10:00PM. You are welcome to stay for both. Please RSVP
Where: Google Campus: 1600 Amphitheatre Pkwy, Mountain View, CA 94043. It will be held in the Seville Tech Talk room.
Brian Moschel just told us about Include:
It determines which files to compress at runtime and automatically compresses them into one script using Dean Edwards' Packer.
You can include any JavaScript from any other JavaScript with a relative path:
Then turn on compression like this:
When you reload the page, a window will open that contains a list of the scripts as they are loaded, the uncompressed collection of code, and the code compressed with Packer. You save the compressed code on your server and turn on production mode:
Scripts load in the same order across all browsers (last-in first-out), which is nice, considering document.write by default works differently in Opera.
Another aspect we're excited about is that Include makes it so you'll never have to write a custom server-side compression script again. Since the scripts to compress are determined at runtime, you can easily compress large libraries with conditional plugins, like TinyMCE.
Instead of duplicating that logic in a server-side script, you can choose your plugins, turn on compress mode, and you've got your compressed code. To demonstrate this, we ported TinyMCE and plugins to use Include.
Include is open-source with an MIT license. I hope you find it as useful as we have.
The MooTools folks have added Dojo 1.0.2 to the set of tests on their Slickspeed. It is actually quite cool of them to put up this test and compare other frameworks.
I just ran it on Firefox beta 3 on the Mac and the final results (for what it is worth) were:
Dojo comes up top, and gets a lot of green bars, as does jQuery, and even MooTools. Prototype wins on ".classname" which is very commonly used.
Wayne Shea and Tenni Theurer have continued their performance series by delving into the iPhone and its poor little cache.
I always wonder why the cache is so small. It is typical Apple to not allow an expert mode where you can tweak it. I would rather have a few less songs and have a large cache. But, Steve knows best ;)
The end result of the article is that you should follow this ideal rule:
Reduce the size of each component to 25 Kbytes or less for optimal caching behavior
Given that the wireless network speed on iPhone is limited and the browser cache is cleared across power cycle, it is even more important to make fewer HTTP requests to achieve good performance than in the desktop world. To reduce the number of HTTP requests, Safari on iPhone supports image map, CSS sprites, inline images and inline CSS images. Take advantage of the browser cache whenever possible. If an external component can be shared across multiple pages in the site, remember that each individual component has to be smaller than 25 KB to be cacheable. Also, the maximum cache limit of all components is 475 - 500 KB. Minify all the JavaScript, CSS and HTML. For components that aren’t shared across multiple pages, consider making them inline.
This of course is quite painful if you like to package JavaScript in One Large File for other performance reasons, or if you use a library that is larger than 25KB!
The iPhone can tell us a bunch of things about a site. If I go to TechCrunch for example, it drives me batty as it does a bunch of JavaScript to load in the CrunchBase widgets, and the iPhone keeps thinking it is loading. The blue bar keeps going, and the browser isn't as responsive. I hate those Crunchbase widgets :)
John Resig has analyzed JavaScript library loading speed by looking into the recent PbWiki testing results.
He delves into the fact that file size != speed and puts out the simple formula:
Total_Speed = Time_to_Download + Time_to_Evaluate
We also seem to obsess about packing and minification, where it often does give us that much since the act of gziping the data often does enough. Thay being said, if you have a lot of JavaScript it can certainly be worthwhile. It matters the most that the frameworks themselves (which are normally bigger than the app) play nice.
In fact, walk around your site with Firebug/YSlow and see if you have set your headers up correctly. After watching Steve Souders at work, it boggles my mind how many big sites are misconfigured (let alone small ones).
Chris Double attended the Tamarin Tech summit, and gives us some information about Tamarin Tracing the new trace based JIT experiment:
'Tamarin Tracing' is an implementation that uses a 'tracing jit'. This type of 'just in time compiler' traces code executing during hotspots and compiles it so when those hotspots are entered again the compiled code is run instead. It traces each statement executed, including within other function calls, and this entire execution path is compiled. This is different from compiling individual functions. You can gain more information for the optimizer to operate on, and remove some of the overhead of the calls. Anytime the compiled code makes a call to code that has not been jitted, the interpreter is called to continue.
Apparently the JIT for Lua is also being written using a tracing jit method and a post by Mike Pall describes the approach they are taking in some detail and lists references. A followup post provides more information and mentions Tamarin Tracing.
Brendan Eich talked about trace based JIT's as being the future of JavaScript VM's, and a reason why we will see insanely fast JavaScript without needing all of the type fun.
Here is one simple benchmark of a fibonaci equation solution that doesn't do any caching tricks:
# Turn off the tracer $ shell/avmshell -lifespan -interp fib.abc fib 30 = 1346269 Run time was 26249 msec = 26.25 sec # Turn on the tracer $ shell/avmshell -lifespan fib.abc fib 30 = 1346269 Run time was 1967 msec = 1.97 sec
Chris finishes with some fun facts:
Ajax
Programming
JavaScript
performance
jetty
architecture
COMET