Ars Technica has a new columnist, John Resig. His first piece is on Extreme JavaScript Performance which has started to come to us in abundance recently!
His article focuses on the latest updates to the fish, SquirrelFish Extreme:
A popular technique that is gaining traction amongst JavaScript engine implementers is that of optimizing the engine, while it's still processing the JavaScript code, to determine the "type" of the object that is being used. Since JavaScript doesn't include any sort of explicit type system JavaScript engines are frequently forced to check and re-check the values that they are handling, to insure their integrity. SFX rounds out the collection of other modern JavaScript engines, namely V8 and TraceMonkey, to provide this form of polymorphic inline caching. Interestingly, the idea for this form of caching comes from the Self programming language, the origin of many of the ideas in JavaScript (such as using prototypal inheritance instead of the more-common classical form of object inheritance seen in languages like Java).
JavaScript engines are serving as the test bed for new forms of dynamic language optimization. No other language is seeing this level of competition and rapid improvement that JavaScript is. This is optimal considering that JavaScript is one of the most widely-deployed programming languages available.
The SquirrelFish Extreme release currently stands as the fastest JavaScript engine [based on SunSpider] (although that's certain to change as healthy competition continues).
Based on its performance on the regexes it does handle, WREC (WebKit Regular Expression Compiler) is indeed an awesome design. regexp-dna.js, however, is flawed and exaggerates SFX performance.
We could use nanojit to make a regex compiler for SpiderMonkey that would perform as well as WREC. But I don’t know if it’s worthwhile yet. Regex performance is much less important for today’s web than it is for SunSpider–I hope to link to a report on that in a future post.
That was the conclusion that David Mandelin of the Tamarin project as he looked into how "SquirrelFish Extreme (SFX) is kicking our butts so badly on regexp-dna.js."
I love David's posts, as they go into the real meat of the tech:
Technical details: the design of WREC. There are two main ways to implement regular expressions: using a backtracking matching engine, or by transforming the regex to a finite automaton (NFA, aka “state machine”), which does not backtrack. Most Perl-type regex engines, including both SpiderMonkey’s and WREC, follow the backtracking design. I don’t know the exact history of that choice, but at present it is much easier to implement features like group capture and backreferences in the backtracking design. Also, although some regexes scale only if implemented as NFAs, my tests suggest that many simple regexes, including those in SunSpider, are faster with backtracking.
As of this writing, WREC’s implementation strategy is dirt simple (which is a good thing). There are no transformations or fancy optimizations on the regex. WREC simply generates native code that directly implements the backtracking search. Thus, within a single match operation, there are no function calls, no traversals of regular expression ASTs, and few option tests, so almost all of the overhead is eliminated.
WREC’s code is very easy to read, so if you want to know exactly how it works, just read it in WREC.cpp. It’s also great example code for anyone implementing a compiler for a simple language like regular expressions. The basic plan is to parse the regular expression with functions named things like parseDisjunction (the | operator). Those functions directly call functions like generateDisjunction that generate the native code using the same assembler that the call-threading interpreter uses. There’s also the oddly named “gererateParenthesesResetTrampoline”. Inexplicably preserved typo, or watermark to detect copying of WREC code?
Steve Souders is launching Hammerhead today at The Ajax Experience.
What is Hammerhead? I kinda think of it as continuous integration for performance. It is a Firebug plugin that you can setup to monitor the performance of your application. Imagine if you add a new feature that you think will speed things up, this tool will let you know how performance was really affected.
There are also cool features when you just want to whip it up on your own Firebug:
Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.
Finally, Steve Lamm posted on the Google Code blog about testing slower connections as well as the high speed one that you are probably on, and the techniques for doing that with Hammerhead.
Steve continues to come up with small useful tools for Web developers. Thanks Steve!
While Ben and I were talking about JavaScript performance (and other things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the very new and improved version that appears to do very well at SunSpider:
SquirrelFish Extreme: 943.3 ms V8: 1280.6 ms TraceMonkey: 1464.6 ms
What makes it so fast?
SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.
1. Bytecode Optimizations
When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.
One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic - they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.
In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.
2. Polymorphic Inline Cache
One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.
Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure - as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.
So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID - usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.
Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.
We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.
3. Context Threaded JIT
Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.
The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formally between opcodes are now inline, and made highly predictable to the CPU’s branch predictor.
Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.
One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.
Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.
4. Regular Expression JIT
As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.
Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.
We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.
Major kudos to the entire SFX team for pulling this off. Now, to grab a new nightly...
Steve has found a new tidbit that has him excited. The feature at hand comes from Opera
Primarily for low bandwidth devices, not well-tested on desktop. Ignore script tags until entire document is parsed and rendered, then execute all scripts in order and re-render.
Steve explains how you he is a fan of splitting up JavaScript into a small core, and then loading other functionality asynchronously later. This defer gives you some of that benefit, and also groks document.write, which no other technique works with:
One limitation of these techniques is that you can’t use document.write, because when a script is loaded asynchronously the browser has already written the document. Hardcore JavaScript programmers avoid document.write, but it’s still used in the real world most notably, and infamously, by ads. A feature of Opera’s “Delayed Script Execution” option is that, even though scripts are deferred, document.write still works correctly. Opera remembers the script’s location in the page and inserts the document.write output appropriately.
So, this is why Steve is interested to dive deeper and see if this has the performance benefits that make sense theoretically:
One immediate benefit of this Opera preference is that web developers can see the impact of delay-loading their JavaScript. A practice I’m advocating a lot lately is splitting a large JavaScript payload into two pieces, one of which can be loaded using an asynchronous script loading technique. This is often a complex task as the JavaScript payload grows in size and complexity. With this “Delayed Script Execution” feature in Opera, developers can get an idea of how their page would feel before undertaking the heavy lifting.
I’m even more excited about how this shows us what is possible for the future. To be able to have asynchronous script loading and preserve document.write output is like having your cake and eating it too. It’s difficult for users to find this feature in Opera. And it’s beyond the reach of web developers. But if Opera’s “Delayed Script Execution” behavior was the basis for implementing SCRIPT DEFER in all browsers, it would open the door for significant performance improvements by simply adding six characters (”DEFER ”).
This is most significant for the serving of ads. Often ads are served by including a script that contains document.write to load other resources: images, flash, or even another script. Ads are typically placed high in the page, which means today’s pages suffer from slow loading ads because all their content gets blocked. And really, it’s not the pages that suffer, it’s the users. Our experience suffers. Everyone’s experience suffers. If browsers supported an implementation of SCRIPT DEFER that behaved similar to Opera’s “Delayed Script Execution” feature, we’d all be better off.
Food for thought for Safari, Firefox, and IE.
Sameer Chabungbam of Microsoft posted about the new JScript profiler the includes the following functionality:

Eric Pascarello has also been looking at new tools, and wrote up his experience with the Google Chrome Debugger. He details the breakpoint walking functionality as well as the many commands available.
Brendan Eich jumped right in and benchmarked the tip of tree for TraceMonkey, with the V8 version that came with Google Chrome:
We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.
This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven't had enough hours in the day to get to it, but it's "next".
Brendan shows SunSpider running there, and V8 has that and other benchmarks to run too. Isn't it great when a performance arms war is on? Thank god for competition here. We all win.
Ray Cromwell ran tests himself, on his own app Chronoscope (note, probably NOT using tip of tree TraceMonkey):
Chronoscope is written in GWT, and to some extent, the GWT compiler may negate some of Chrome's V8 technology in the sense that GWT "de-classes" many OO polymorphic dispatches into a more functional style of programming, removing as much dynamic dispatch as possible, and eliminating prototype lookups and function call overhead through inlining. I don't know if GWT hurts "hidden classes" or not, but it might be possible that if GWT didn't provide such optimizations, the performance differential might be larger.
Despite this, the results are still good. The test consisted of calling the chart's redraw() function 100 times per trial, with 10 trials. The slowest and fastest trial are thrown out, and the mean and standard deviation are calculated on the remaining data.
I tested on a Mac Pro 2.66Ghz with 6Gb of memory, OSX 1.5. The tests were conducted within a Parallels VM running XP2 Service Pack 2, given 2 CPUs and 2Gb of memory. For each browser, I rebooted the VM from a clean start, and ran only the test browser.
![]()
And for a bit of fun, Marc-Andre Cournoyer tied together HotRuby (remember that? the beast that runs YARV code in the browser!) and V8 to create fast Ruby in the browser.
Good times.
Razor Profiler is a web-based Ajax profiling tool to help web developers understand and analyze the runtime behavior of their JavaScript code in a cross-browser environment. Razor Profiler can be access either online as a service; or be downloaded to run locally, and was created by Coach Wei who has done a lot of work for Nexaweb and Apache.
Razor Profiler Features
Razor Profiler automates JavaScript profiling:
How Does Razor Profiler Work?
Razor Profiler composes of a server component that runs inside a standard Java EE Servlet engine, and a JavaScript-based client component that runs inside any browser. Once you have Razor server started, you can profile your JavaScript application by entering the start URL of your application into Razor Profiler and run through your test scenarios. Razor Profiler will automatically record data and visualize them for your analysis. There is no client side installation, browser configuration change or application code change required. In order to achieve this, Razor Profiler goes through five different phases:

You have seen this before: /path/to/something.js?v=2, or maybe it used a date or a version control id or some such. The notion of putting the version into the URL so you can aggressively cache and yet quickly push new versions.
There has long been issues with using the querystring as the version. At some point I seem to remember Safari not going a good job caching that scenario and thinking that it was different.
Steve "Neo" Souders has posted about this issue especially as it relates to proxy servers and default configurations:
There’s a section in my book called Revving Filenames. It contains an example of adding a version number to the filename. That’s prompted several emails where people have asked me about tradeoffs around using a querystring versus embedding something in the filename. I wasn’t aware of any performance difference, but in a meeting this week a co-worker, Jacob Hoffman-Andrews, mentioned that Squid, a popular proxy, doesn’t cache resources with a querystring. This hurts performance when multiple users behind a proxy cache request the same file - rather than using the cached version everybody would have to send a request to the origin server.
I tested this by creating two resources,
mylogo.1.2.gifandmylogo.gif?v=1.2. Both have a far future Expires date. I configured my browser to go through a Squid proxy. I made one request tomylogo.1.2.gif, cleared my cache (to simulate another user making the request), and fetchedmylogo.1.2.gifagain. This produces the following HTTP headers:>> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: HIT from someserver.com << X-Cache-Lookup: HIT from someserver.comNotice that the second response shows a HIT in the X-Cache and X-Cache-Lookup headers. This shows it was served by the Squid proxy. More evidence of this is the fact that the Date and Expires response headers have the same values, even though I made these requests 10 seconds apart. For conclusive evidence, only one hit shows up in the stevesouders.com access log.
Loading
mylogo.gif?v=1.2twice (clearing the cache in between) results in these headers:>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:34 GMT << Expires: Tue, 21 Aug 2018 00:19:34 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:47 GMT << Expires: Tue, 21 Aug 2018 00:19:47 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.comHere it’s clear the second response was not served by the proxy: the caching response headers say MISS, the Date and Expires values change, and tailing the stevesouders.com access log shows two hits.
Proxy administrators can change the configuration to support caching resources with a querystring, when the caching headers indicate that is appropriate. But the default configuration is what web developers should expect to encounter most frequently. Another interesting note about these tests: notice how the proxy downgrades the responses to HTTP/1.0. This is going to alter browser behavior in terms of the number of connections that are opened. When I’m doing performance analysis I make sure to avoid being connected through a proxy.
This is the hack that John Grubber used to test whether iPhone 2.x had snuck in SquirrelFish. He was curious due to the performance improvements that he witnessed:

What about iPhone limits though? David Golightly tests the limits on the iPhone with a script that keeps downloading tiles until it can no longer do so:
After downloading about 210 images, the iPhone simply stops downloading new ones. This is probably due to hitting the hard 30MB same-page resource limit.
John Resig continues his streak of compelling blog entries with "DOM DocumentFragments" where he shows that:
A method that is largely ignored in modern web development can provide some serious (2-3x) performance improvements to your DOM manipulation.
The technique shown is compatible across a large swath of modern browsers, including our friend IE6. Here's an example of using DocumentFragments:
Kimble Young has created Webslug, the "hot or not of website performance."
It was inspired by webwait but lets you compare the load times of sites and records every performance test for later analysis like browser used, country of origin, top competitors etc.
For example, comparing reddit to Digg:
It is useful to compare versions of your own site, too.

Mario Heiderich has released qUIpt, a library that uses the window.name property to store away useful data, in this case JavaScript.
How does it work?

Mario Heiderich has released qUIpt, a library that uses the window.name property to store away useful data, in this case JavaScript.
How does it work?
Steve Souders has a wrap up on the Velocity conference that he co-chaired. He links to his favourite content from the show, which contains a lot of Ajax related work. It was really good to hear snippets form the show such as Eric Lawrence of Microsoft saying "we hope to make Steve's book out of date" as we see browser vendors look more and more on performance.
Just after I posted about Ernest's canvas experiment with photos he put something else up that tests the performance of rendering polygons with Canvas compared to other techniques.
The demo lets you run a live test, and view saved tests, comparing the Google Maps interface, which "currently draws polygons using VML for Internet Explorer, SVG for Firefox and image retrieval for Safari and Firefox linux."
At first the results were surprising. The canvas version was magnitudes faster. However, then they worked out that the live Google Maps version is actually doing a lot more than just drawing the polygons, that being said, a commenter had a valid point:
If we analyze the rendering time of the markup alone, both SVG and VML are not necessarily slower than canvas and canvas+excanvas.js. So the difference in performance is due to the implementation of polygons before the markup is output which the canvas implementation is skipping.
That doesn't make the experiment invalid. You didn't show that Canvas is faster than SVG or VML.
But you did show that it's possible to get much better polygon performance than the current API using a more direct to the metal approach - with whatever rendering engine. And people are crying out for faster polygons.
Keeping in the performance vein, YSlow put a new version out in time for Firefox 3.
The new version includes:
Eric Falcao has released Clientperf, a simple client-side Rails performance plugin.
The tool came about as Eric is giving a talk on "14 rules of high-performance websites in the typical rails mongrel/nginx stack, the main idea being to focus on some of the important implementation details when it comes to client-side performance optimization."
As I was planning, I realized that there was no simple as in the we’re-all-spoiled-with-rails simple way to measure client download times in production. Now, there is clientperf. It’s just a start, but decent enough to benchmark the actual client performance impact of any optimizations you make.
How it works
It injects javascript into the page that takes a timestamp at the top of the page and at the bottom of the page. Once the browser is done downloading, evaluating and rendering all assets, clientperf makes one last image request to your server with the start time, end time and the URL. Piece of cake.
Today is the kick off of the Velocity performance conference, and we are going to see a fair share of performance news over the next day or two.
To start out, Bill Scott (Rico/ex-Yahoo/now Netflix) has announced a new Firebug plugin, Jiffy that adds a new tab showing fine grained performance data. You want to know the time between the onunload of the previous page, the first rendering, time until onload, time after, and more.
This is where Jiffy-Web comes in. Jiffy-Web is a fine-grained and flexible website performance tracking and analysis suite written by Scott Ruthfield and the team at Whitepages.com.
The Firebug plugin uses that data, which it gets from the DOM JSON object, to do the visualization.
Bill wrote a detailed post on Measuring User Experience Performance that goes into the details behind this tool.
He goes into detail on how to measure things, and what can get in the way. For example, onunload:
The most logical place to measure the start of a request ("from Click") is on the originating page (see A in figure above). The straighforward approach is to add a timing capture to the unload event (or onbeforeunload). More than one technique exist for persisting this measurement, but the most common way is to write the timing information (like URL, user agent, start time, etc.) to a cookie.
However, there is a downside to this methodology. If the user navigates to your home page from elsewhere (e.g., from a google search), then there will be no "start time" captured since the unload event never happened on your site. So we need a more consistent "start time".
We address this by providing an alternate start time. We instrument a time capture at the very earliest point in the servlet that handles the request at the beginning of the response (see B in figure above). This guarantees that we will always have a start time. While it does miss the time it takes to handle the request, it ends up capturing the important part of the round trip time -- from response generation outward.
There are a number of ways to save this information so that it can be passed along through the response cycle to finally be logged. You can write out a server-side cookie. You can generate JSON objects that get embedded in the page. You could even pass along parameters in the URL (though this would not be desirable for a number of reasons). The point is you will need a way to persist the data until it gets out to the generated page for logging.
Note that the absolute time captured here is in server clock time and not client clock time. There is no guarantee these values will be in sync. We will discuss how we handle this later.
He also talks about practical issues that he has found implementing this at Netflix, and when the data shows you the real truth:
Recently we fielded a different variation of our star ratings widget. While it cut the number of HTTP requests in half for large Queue pages (a good thing) it actually degraded performance. Having real time performance data let us narrow down on the culprit. This feedback loops is an excellent learning tool for performance. With our significant customer base, large number of daily page hits we can get a really reliable read on the performance our users are experiencing. As a side note, the median is the best way to summarize our measurements as it nicely takes care of the outliers (think of the widely varying bandwidths, different browser performance profiles that can all affect measurements.)
Omar AL Zabir of Pageflakes.com has posted on ensure, his JavaScript library that provides a handy function ensure which allows you to load JavaScript, HTML, CSS on-demand and then execute your code.
Ensure ensures that relevant JavaScript and HTML snippets are already in the browser DOM before executing your code that uses them.
For example:
You can also specify multiple Javascripts, html or CSS files to ensure all of them are made available before executing the code:
Omar says:
Websites with rich client side effects (animations, validations, menus, popups) and Ajax websites require large amount of Javascript, HTML and CSS to be delivered to the browser on the same web page. Thus the initial loading time of a rich web page increases significantly as it takes quite some time to download the necessary components. Moreover, delivering all possible components upfront makes the page heavy and browser gets sluggish responding to actions. You sometimes see pull-down menus getting stuck, popups appearing slowly, window scroll feels sluggish and so on.
The solution is not to deliver all possible HTML, Javascript and CSS on initial load instead deliver them when needed. For example, when user hovers the mouse on menu bar, download necessary Javascript and CSS for the pull-down menu effect as well as the menu html that appears inside the pull-down. Similarly, if you have client side validations, deliver client side validation library, relevant warning HTML snippets and CSS when user clicks the 'submit' button. If you have a Ajax site which shows pages on demand, you can load the Ajax library itself only when user does the action that results in an Ajax call. Thus by breaking a complex page full of HTML, CSS and Javascript into smaller parts, you can significantly lower down the size of the initial delivery and thus load the initial page really fast and give user a fast smooth browsing experience.
There is a detailed writeup on how it all works, and it dovetails with the recent performance proposals around when to download resources (sometimes you may not want to wait for on demand loading of course).
Tom Trenka has followed up his last post on String performance with a deep dive on IE that dispells the myth of Array.join.
Tom goes through tons of tests across versions of IE and using varying sizes of data.
In Conclusion
First things first—with the performance improvements with IE7, we no longer need to consider using an alternate path when doing large scale string operations; using Array.join in an iterative situation gives you no major advantages than using += in the same situation. In addition, the differences with IE6 were slight enough to allow you to not bother forking for that specific version.
The only time considering using an array as opposed to a string for these kind of operations is when you are aware that the fragments you are appending are very large (on the order of > 65536 bytes); doing this will cause the GC issues Dan Pupius talks about in his analysis of object allocation and the JScript garbage collector.
From there, we can progress to programming techniques—with Internet Explorer, it is much better to call Builder.append with as many arguments as possible than to simply iterate and push things in one at a time.
It is also better to start small; try to structure your string operations so that very large string operations are minimized. In this case, using a temporary buffer to assemble a set of strings together and then adding them to a much larger string is better than constantly adding small fragments to a larger string.
And as always, minimizing the size of an iteration will help get extra performance out of JScript.
The raw numbers have been made available to scour over.
Bob Matsuoka has written a guest article on the topic of lazy script loading. Thanks so much Bob!
A recent article "Lazily load functionality via Unobtrusive Scripts" discussed how to lazily load Javascript script files by appending script elements to the HEAD tag.
While this works as expected, I've found that for best results, you should also consider tracking which scripts have been loaded in order to prevent re-loading an already loaded script, and more importantly supporting callbacks so that you can guarantee loading of scripts prior to calling functions that depend on that code.
NOTE: The example loader script, which has been tested in FF, IE, Safari, and Opera, uses prototype.js for DOM and array routines. I developed this originally for a project that already had prototype.js available, but it uses it only superficially. It should be simple to remove these references if you're not using prototype.js.