
Nicholas Zakas decided to dive deep on everyone's favorite sign that you've done something wrong:
Few web developers truly understand what triggers the long-running script dialog in various browsers, including myself. So I decided to sit down and figure out under what circumstances you’ll see this dialog. There are basically two different ways of determining that a script is long-running. First is by tracking how many statements have been executed and second is by timing how long the script takes to execute. Not surprisingly, the approach each browser takes is slightly different.
He finds that Internet Explorer's warning is based on total statements executed (5 million, and since it's Windows, you can change it via the Registry), Firefox and Safari time the actual script time (10 and 5 seconds, respectively), Chrome is a bit of a mystery, and Opera doesn't appear to have such a mechanism (and interestingly, appears to put its UI on a different thread than page rendering / script execution).
Regardless of the details, the lesson remains the same (again quoting from Nicholas' post):
Brendan Eich, creator of JavaScript, is quoted as saying, “[JavaScript] that executes in whole seconds is probably doing something wrong…” My personal threshold is actually much smaller: no script should take longer than 100ms to execute on any browser at any time. If it takes any longer than that, the processing must be split up into smaller chunks.
An interesting read!
Steve Souders has detailed the coupling of script loading with various asynchronous techniques with examples that show the timings that you can get. First he sets the scene:
One issue with async script loading is dealing with inline scripts that use symbols defined in the external script. If the external script is loading asynchronously without thought to the inlined code, race conditions may result in undefined symbol errors. It’s necessary to ensure that the async external script and the inline script are coupled in such a way that the inlined code isn’t executed until after the async script finishes loading.
There are a few ways to couple async scripts with inline scripts.
- window’s onload - The inlined code can be tied to the window’s onload event. This is pretty simple to implement, but the inlined code won’t execute as early as it could.
- script’s onreadystatechange - The inlined code can be tied to the script’s onreadystatechange and onload events. (You need to implement both to cover all popular browsers.) This code is lengthier and more complex, but ensures that the inlined code is called as soon as the script finishes loading.
- hardcoded callback - The external script can be modified to explicitly kickoff the inlined script through a callback function. This is fine if the external script and inline script are being developed by the same team, but doesn’t provide the flexibility needed to couple 3rd party scripts with inlined code.
Then he goes into an example showing a simple asynchronous loading technique, and then coupling John Resig’s degrading script tags pattern, added to the end of the target script, and lazy loadingit:
Finally, the conclusion:
Loading scripts asynchronously and lazyloading scripts improve page load times by avoiding the blocking behavior that scripts typically cause. This is shown in the different versions of adding sorttable to UA Profiler:
- Normal Script Tags - 487 ms
- Asynchronous Script Loading - 429 ms
- Lazyloading - ~320 ms
The times above indicate when the onload event occurred. For other web apps, improving when the asynchronously loaded functionality is attached might be a higher priority. In that case, the Asynchronous Script Loading version is slightly better (~400 ms versus 417 ms). In both cases, being able to couple inline scripts with the external script is a necessity. The technique shown here is a way to do that while also improving page load times.
I posted on my personal blog about using the crowd to tell us about browser responsiveness in which I discussed giving developers information about browser responsiveness and how add-ons can affect it:
I have had some folks talk to me about responsiveness issues with Firefox 3. I have had a fantastic experience, and currently I run Mozilla nightlies / Minefield / Shiretoka (3.1.*) and WebKit nightlies side by side. I am very happy with the shape that Minefield is in.
Of course, the issue with the extension mechanism with Firefox is that you get a window to the entire world (which has also been a reason that lead to amazing add-ons). Since this is the case a bad add-on can do a lot.
Chrome does a good job showing you basic info about a tab (memory etc). What if we did that and more for add-ons. Give me top for the browser.
Now, this is a lot of engineering away, so can we use the crowd to help out?
What if we created an add-on that would track responsiveness information and send it back (anonymously) to the cloud (say, to Weave). We could use math to work out probable culprits and could even ship that information back to the people using the add-on. Thus, you would then find out that FooAddOn seems to be a culprit that slows down the browser. Maybe it could be called Vacinate-addon.
Then I got talking with Dav Glass who is working on a very interesting proof of concept BrowserPlus Profiler:
A service that analyzes the memory and cpu usage of a web browser. The service can take 1 sample or multiple samples at a specified interval. When sampling at intervals, at most 1,000 samples are taken. If you provide a callback function, your javascript will be called after every sample is taken. If no callback is provided, all samples are stored in an array and returned after start() completes or stop() is called.
The sample object is a map with the following keys (most values are floats):
- [sample] - the sample number (1-1,000)
- [time] - the string representation of the time the sample was taken
- [sys] - the percentage CPU "sys" processes are using
- [user] - the percentage CPU "user" processes are using
- [ffxcpu] - the percentage CPU Firefox is using, or -1.0 if it is not running
- [ffxmem] - the amout of memory Firefox is using, or -1.0 if it is not running
- [safcpu] - the percentage CPU Safari is using, or -1.0 if it is not running
- [safmem] - the amout of memory Safari is using, or -1.0 if it is not running
This is very early stage, and they are looking for good people and ideas on how t get good data across platforms (browsers and operating systems). I would love to see this.

Steve Souders has updated his UA Profiler tool that tracks the performance traits of various browsers. Being able to drill down and see the differences from build to build is great stuff, and here are all the new features:
Previously, I had one label for a browser. For example, Firefox 3.0 and 3.1 results were all lumped under “Firefox 3?. This week I added the ability to drilldown to see more detailed data. The results can be viewed in five ways:
Steve Souders has a nice performance roundup for 2008 that details some of the important utilities and knowledge that we gained this year.
His post gets even more interesting when he posits about the future, including:
Steve has been teaching a class on performance at Stanford this semester. You can check out his final and midterm to see if you could ace the exam. Also, you can see material from the guest lecturers. Check it out.
And, what are you looking forward to in 2009 wrt performance?

We posted on Steve's UA Profiler tool, and John Resig has taken a nice look at the current results.
It actually now looks like Minefield (Firefox nightly) is getting 10 out of 11, and the other browsers are doing great too.
Jonas Sicking of Mozilla has a really nice comment that talks about what the engines are doing and some nuances. For example, if you have a CSS file and a JS file, do you block just in case the JS looks into CSS values (e.g. "in case there is a call to .offsetTop in the script"). How about looking ahead to see? That is the case. You can download away and try to do the right thing. document.write() is another beast that seems to do a lot of harm. Having the browser be smart about it ("they don't do that") will be good.
Back to John, he also discusses features that we can use as developers:
Prefetching
This is part of the HTML 5 specification and allows for pages to specify resources which should be opportunistically downloaded in case they should be used in the future (the common example of image rollovers could be used here).
There's a full page describing how to use them on the Mozilla developer wiki but it isn't that hard to get started. It's as simple as including a new link element in the top of your site:
HTML:
<link rel="prefetch" href="/images/big.jpeg">And that resource will be downloaded preemptively.
Inline Images
The final case that the profiler tests for is the ability of a browser to support inline images using a data: URI. Data URIs give developers the ability to include the image data directly within the page itself. While this saves an extra HTTP request it's important to note that the resource will not be cached (at least not as external resource - it may be cached as part of the complete page). The use of this technique will vary on a case-by-case basis but having a browser support it is absolutely important.
This is officially the week of John. If he delivers top notch posts for the rest of the week he wins an Ajaxian award or something. Maybe we need to bring back the "pack of cards" where each card is an Ajax personality and John gets to be Ace of Hearts or something.
I remember talking with some of the V8 team about how poor the world of timing is. Chrome is a lot more accurate in its timing, which can do it a disservice in browser performance tests. Some browsers would respond with "0" when Chrome would return "0.001" and it would hence suffer.
Add that to the flawed "just add up the total time for all tests" mentality of some tests and you end up with very skewed results (you could do amazingly bad on one test that in practice never matters and really well on the others, but it all evens out).
Here comes John with a post on the accuracy of JavaScript timing which came out of a bad situation:
I was running some performance tests, on Internet Explorer, in the SlickSpeed selector test suite and noticed the result times drastically fluctuating. When trying to figure out if changes that you've made are beneficial, or not, it's incredibly difficult to have the times constantly shifting by 15 - 60ms every page reload.
This lead him to tests life on various browsers and operating systems and he put up the raw data for you to check out.
He concludes:
Testing JavaScript performance on Windows XP and Vista is a crapshoot, at best. With the system times constantly being rounded down to the last queried time (each about 15ms apart) the quality of performance results is seriously compromised. Dramatically improved performance test suites are going to be needed in order to filter out these impurities, going forward.
Robert Kieffer has announced JSLitmus a tool "designed specifically to allow you to quickly and easily write a JavaScript test (or test suite), run it on any modern browser, and document and share the results."
To see it in action, Robert writes a test on "++" and plots the results for different browsers, and then draws some conclusions.
The API for creating a test is simple:
Once you define your tests you can run them in the page thanks to the popup, and then it will do its thing and give you a Google Chart at the other end of things.
A nice little tool.

Coach Wei has updated Razor Optimizer, "a JavaScript optimization tool for reducing code footprint and increasing runtime performnace. As a cross-browser web application itself, Razor Optimizer can be access either online as a service, or to be downloaded to run locally.
Razor Optimizer is based on a new approach for JavaScript optimization called "razor". While other optimization techniques such as JS minimization and concatenttion are based on static lexical analysis, Razor uses dynamic runtime profile information to achieve breakthrough results of 60% to 90% savings."
How it works
Razor Optimizer itself is a web based JavaScript application that runs in any browser. It contains a server component and a client component. Razor Optimizer client is an Ajax application based on Dojo 1.1. Razor Optimizer Server is a Java web application that runs inside any Java Servlet container. The following figure shows the architecture of Razor Optimizer.
The Idea Behind Razor Optimizer
Razor is based on the following observations:
- JavaScript functions are the basic low level building blocks of JavaScript code. Though typical JavaScript applications are made up of JavaScript files, functions are at a lower level than files because each JavaScript file is composed of JavaScript functions. While current JavaScript optimization techniques operates on a “file” level, performing optimization at the function level could yield much better result;
- At any moment of time, the browser needs only one function because only one JavaScript function is executed at any moment of time.
- Theoretically, the application would work fine if we download only one function at a time, right before the function is going to be called. Other functions are not needed. They can stay on the server side without being downloaded until they are going to be called. There is no need to download all the code up front, and there no need to download them at once;
- If only one function needs to be downloaded and stay on the client side, we can achieve breakthrough savings in both download size as well as client memory/CPU footprint, resulting in significant performance improvements above any other techniques.
The basic idea of Razor is to “trim” the “not needed” functions and only download these functions that are necessary for a specific usage scenario. This “trimming” process is called “raze”. After the initial download, if a “razed” function is needed, Razor will download this function on demand in the background.
Wouldn't downloading one function at a time be very slow? Indeed. However, if you package a bunch of related functions together and if this one "package" is enough to fulfill one or more use scenarios, the user wouldn't notice any negative performance impact of incremental downloading.
So the key to this approach is to understand when/which function is called during different runtime scenarios. For example, if we know exactly which functions are called and when they are called during the initial application loading, we can trim all other code from the initial download without breaking the application. This would significantly save the initial download size and improve page loading performance.
The knowledge of “when/which function is executed” can be achieved by profiling the application. By recording the profile data, we can have accurate knowledge of the dynamic runtime behavior of the application beyond static lexical analysis for delivering breakthrough optimization results.
What do you think of this approach?
Matt has a nice post on delaying JavaScript execution in a way that waits for certain events to finish:
If you're looking to execute javascript code whenever someone finishes (or stops temporary) scrolling, moving the mouse, or resizing the page, you may find the following segment of code useful.
He shares the following boilerplate code:
This can be useful in a variety of ways, but it got me thinking about having the ability to download code lazily. For example, a friend shared information on an app that would wait for a click and then download code to run that functionality. This was bad, as it made it seem very slow indeed. Instead, the code could be split up into core (what has to be loaded as soon as possible) and then load other code when idly using this technique.
Steve Souders posted on Runtime Page Optimizer a tool that you can think of as a performance proxy. It sits on the server side, and cleans up content before it is sent back to the browser.
What can it do? Steve let us know:
RPO automatically implements many of the best practices from my book and YSlow, so the guys from Aptimize contacted me and showed me an early version. Here are the performance improvements RPO delivers:
- minifies, combines and compresses JavaScript files
- minifies, combines and compresses stylesheets
- combines images into CSS sprites
- inlines images inside the stylesheet
- turns on gzip compression
- sets far future Expires headers
- loads scripts asynchronously
RPO reduces the number of HTTP requests as well as reducing the amount of data that is transmitted, resulting in a page that loads faster. In doing this the big question is, how much overhead does this add at runtime? RPO caches the resources it generates (combined scripts, combined stylesheets, sprites). The primary realtime cost is changing the HTML markup. Static pages, after they are massaged, are also cached. Dynamic HTML can be optimized without a significant slowdown, much less than what’s gained by adding these performance benefits.
Steve had another couple of interesting posts recently:
Ars Technica has a new columnist, John Resig. His first piece is on Extreme JavaScript Performance which has started to come to us in abundance recently!
His article focuses on the latest updates to the fish, SquirrelFish Extreme:
A popular technique that is gaining traction amongst JavaScript engine implementers is that of optimizing the engine, while it's still processing the JavaScript code, to determine the "type" of the object that is being used. Since JavaScript doesn't include any sort of explicit type system JavaScript engines are frequently forced to check and re-check the values that they are handling, to insure their integrity. SFX rounds out the collection of other modern JavaScript engines, namely V8 and TraceMonkey, to provide this form of polymorphic inline caching. Interestingly, the idea for this form of caching comes from the Self programming language, the origin of many of the ideas in JavaScript (such as using prototypal inheritance instead of the more-common classical form of object inheritance seen in languages like Java).
JavaScript engines are serving as the test bed for new forms of dynamic language optimization. No other language is seeing this level of competition and rapid improvement that JavaScript is. This is optimal considering that JavaScript is one of the most widely-deployed programming languages available.
The SquirrelFish Extreme release currently stands as the fastest JavaScript engine [based on SunSpider] (although that's certain to change as healthy competition continues).
Based on its performance on the regexes it does handle, WREC (WebKit Regular Expression Compiler) is indeed an awesome design. regexp-dna.js, however, is flawed and exaggerates SFX performance.
We could use nanojit to make a regex compiler for SpiderMonkey that would perform as well as WREC. But I don’t know if it’s worthwhile yet. Regex performance is much less important for today’s web than it is for SunSpider–I hope to link to a report on that in a future post.
That was the conclusion that David Mandelin of the Tamarin project as he looked into how "SquirrelFish Extreme (SFX) is kicking our butts so badly on regexp-dna.js."
I love David's posts, as they go into the real meat of the tech:
Technical details: the design of WREC. There are two main ways to implement regular expressions: using a backtracking matching engine, or by transforming the regex to a finite automaton (NFA, aka “state machine”), which does not backtrack. Most Perl-type regex engines, including both SpiderMonkey’s and WREC, follow the backtracking design. I don’t know the exact history of that choice, but at present it is much easier to implement features like group capture and backreferences in the backtracking design. Also, although some regexes scale only if implemented as NFAs, my tests suggest that many simple regexes, including those in SunSpider, are faster with backtracking.
As of this writing, WREC’s implementation strategy is dirt simple (which is a good thing). There are no transformations or fancy optimizations on the regex. WREC simply generates native code that directly implements the backtracking search. Thus, within a single match operation, there are no function calls, no traversals of regular expression ASTs, and few option tests, so almost all of the overhead is eliminated.
WREC’s code is very easy to read, so if you want to know exactly how it works, just read it in WREC.cpp. It’s also great example code for anyone implementing a compiler for a simple language like regular expressions. The basic plan is to parse the regular expression with functions named things like parseDisjunction (the | operator). Those functions directly call functions like generateDisjunction that generate the native code using the same assembler that the call-threading interpreter uses. There’s also the oddly named “gererateParenthesesResetTrampoline”. Inexplicably preserved typo, or watermark to detect copying of WREC code?
Steve Souders is launching Hammerhead today at The Ajax Experience.
What is Hammerhead? I kinda think of it as continuous integration for performance. It is a Firebug plugin that you can setup to monitor the performance of your application. Imagine if you add a new feature that you think will speed things up, this tool will let you know how performance was really affected.
There are also cool features when you just want to whip it up on your own Firebug:
Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.
Finally, Steve Lamm posted on the Google Code blog about testing slower connections as well as the high speed one that you are probably on, and the techniques for doing that with Hammerhead.
Steve continues to come up with small useful tools for Web developers. Thanks Steve!
We've heard a lot about optimizing CSS, HTML and JavaScript but one thing that is less talked about is how much extra information image editors put into image files. You might think you've done a great job optimizing your GIFs, PNGs and JPGs while still keeping them visually pleasing but when you use a text editor you'll realize that there is quite a big amount of data you can save by removing information about the image editor used, the date the file was edited last and lots of other bits that really are redundant.
There are a lot of free tools that strip this information from the files for you and squeeze some extra optimization out of the file without affecting the look. The problem is that all of them are command-line based and you need to know how to use them. Stoyan Stefanov and Nicole Sullivan of the Yahoo exceptional performance team took all of these tools and their experience in using them and built one application that does all the optimizations for you in one go:
You can upload images, give it a URL or use smushit as a Firefox extension or bookmarklet. Smushit will show you how many bytes you can save by removing cruft from the images and gives you all the images as a zip file to replace them on your site.
Here's a video of Stoyan and Nicole presenting Smushit.com at The Ajax Experience in Boston (sorry about the audio):
While Ben and I were talking about JavaScript performance (and other things) at Web 2.0 Expo NYC, Maciej Stachowiak announced SquirrelFish Extreme, the very new and improved version that appears to do very well at SunSpider:
SquirrelFish Extreme: 943.3 ms V8: 1280.6 ms TraceMonkey: 1464.6 ms
What makes it so fast?
SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.
1. Bytecode Optimizations
When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.
One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic - they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.
In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.
2. Polymorphic Inline Cache
One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.
Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure - as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.
So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID - usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.
Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.
We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.
3. Context Threaded JIT
Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.
The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formally between opcodes are now inline, and made highly predictable to the CPU’s branch predictor.
Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.
One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.
Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.
4. Regular Expression JIT
As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.
Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.
We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.
Major kudos to the entire SFX team for pulling this off. Now, to grab a new nightly...
Steve has found a new tidbit that has him excited. The feature at hand comes from Opera
Primarily for low bandwidth devices, not well-tested on desktop. Ignore script tags until entire document is parsed and rendered, then execute all scripts in order and re-render.
Steve explains how you he is a fan of splitting up JavaScript into a small core, and then loading other functionality asynchronously later. This defer gives you some of that benefit, and also groks document.write, which no other technique works with:
One limitation of these techniques is that you can’t use document.write, because when a script is loaded asynchronously the browser has already written the document. Hardcore JavaScript programmers avoid document.write, but it’s still used in the real world most notably, and infamously, by ads. A feature of Opera’s “Delayed Script Execution” option is that, even though scripts are deferred, document.write still works correctly. Opera remembers the script’s location in the page and inserts the document.write output appropriately.
So, this is why Steve is interested to dive deeper and see if this has the performance benefits that make sense theoretically:
One immediate benefit of this Opera preference is that web developers can see the impact of delay-loading their JavaScript. A practice I’m advocating a lot lately is splitting a large JavaScript payload into two pieces, one of which can be loaded using an asynchronous script loading technique. This is often a complex task as the JavaScript payload grows in size and complexity. With this “Delayed Script Execution” feature in Opera, developers can get an idea of how their page would feel before undertaking the heavy lifting.
I’m even more excited about how this shows us what is possible for the future. To be able to have asynchronous script loading and preserve document.write output is like having your cake and eating it too. It’s difficult for users to find this feature in Opera. And it’s beyond the reach of web developers. But if Opera’s “Delayed Script Execution” behavior was the basis for implementing SCRIPT DEFER in all browsers, it would open the door for significant performance improvements by simply adding six characters (”DEFER ”).
This is most significant for the serving of ads. Often ads are served by including a script that contains document.write to load other resources: images, flash, or even another script. Ads are typically placed high in the page, which means today’s pages suffer from slow loading ads because all their content gets blocked. And really, it’s not the pages that suffer, it’s the users. Our experience suffers. Everyone’s experience suffers. If browsers supported an implementation of SCRIPT DEFER that behaved similar to Opera’s “Delayed Script Execution” feature, we’d all be better off.
Food for thought for Safari, Firefox, and IE.
Sameer Chabungbam of Microsoft posted about the new JScript profiler the includes the following functionality:

Eric Pascarello has also been looking at new tools, and wrote up his experience with the Google Chrome Debugger. He details the breakpoint walking functionality as well as the many commands available.
Brendan Eich jumped right in and benchmarked the tip of tree for TraceMonkey, with the V8 version that came with Google Chrome:
We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.
This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven't had enough hours in the day to get to it, but it's "next".
Brendan shows SunSpider running there, and V8 has that and other benchmarks to run too. Isn't it great when a performance arms war is on? Thank god for competition here. We all win.
Ray Cromwell ran tests himself, on his own app Chronoscope (note, probably NOT using tip of tree TraceMonkey):
Chronoscope is written in GWT, and to some extent, the GWT compiler may negate some of Chrome's V8 technology in the sense that GWT "de-classes" many OO polymorphic dispatches into a more functional style of programming, removing as much dynamic dispatch as possible, and eliminating prototype lookups and function call overhead through inlining. I don't know if GWT hurts "hidden classes" or not, but it might be possible that if GWT didn't provide such optimizations, the performance differential might be larger.
Despite this, the results are still good. The test consisted of calling the chart's redraw() function 100 times per trial, with 10 trials. The slowest and fastest trial are thrown out, and the mean and standard deviation are calculated on the remaining data.
I tested on a Mac Pro 2.66Ghz with 6Gb of memory, OSX 1.5. The tests were conducted within a Parallels VM running XP2 Service Pack 2, given 2 CPUs and 2Gb of memory. For each browser, I rebooted the VM from a clean start, and ran only the test browser.
![]()
And for a bit of fun, Marc-Andre Cournoyer tied together HotRuby (remember that? the beast that runs YARV code in the browser!) and V8 to create fast Ruby in the browser.
Good times.
Razor Profiler is a web-based Ajax profiling tool to help web developers understand and analyze the runtime behavior of their JavaScript code in a cross-browser environment. Razor Profiler can be access either online as a service; or be downloaded to run locally, and was created by Coach Wei who has done a lot of work for Nexaweb and Apache.
Razor Profiler Features
Razor Profiler automates JavaScript profiling:
How Does Razor Profiler Work?
Razor Profiler composes of a server component that runs inside a standard Java EE Servlet engine, and a JavaScript-based client component that runs inside any browser. Once you have Razor server started, you can profile your JavaScript application by entering the start URL of your application into Razor Profiler and run through your test scenarios. Razor Profiler will automatically record data and visualize them for your analysis. There is no client side installation, browser configuration change or application code change required. In order to achieve this, Razor Profiler goes through five different phases:

You have seen this before: /path/to/something.js?v=2, or maybe it used a date or a version control id or some such. The notion of putting the version into the URL so you can aggressively cache and yet quickly push new versions.
There has long been issues with using the querystring as the version. At some point I seem to remember Safari not going a good job caching that scenario and thinking that it was different.
Steve "Neo" Souders has posted about this issue especially as it relates to proxy servers and default configurations:
There’s a section in my book called Revving Filenames. It contains an example of adding a version number to the filename. That’s prompted several emails where people have asked me about tradeoffs around using a querystring versus embedding something in the filename. I wasn’t aware of any performance difference, but in a meeting this week a co-worker, Jacob Hoffman-Andrews, mentioned that Squid, a popular proxy, doesn’t cache resources with a querystring. This hurts performance when multiple users behind a proxy cache request the same file - rather than using the cached version everybody would have to send a request to the origin server.
I tested this by creating two resources,
mylogo.1.2.gifandmylogo.gif?v=1.2. Both have a far future Expires date. I configured my browser to go through a Squid proxy. I made one request tomylogo.1.2.gif, cleared my cache (to simulate another user making the request), and fetchedmylogo.1.2.gifagain. This produces the following HTTP headers:>> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: HIT from someserver.com << X-Cache-Lookup: HIT from someserver.comNotice that the second response shows a HIT in the X-Cache and X-Cache-Lookup headers. This shows it was served by the Squid proxy. More evidence of this is the fact that the Date and Expires response headers have the same values, even though I made these requests 10 seconds apart. For conclusive evidence, only one hit shows up in the stevesouders.com access log.
Loading
mylogo.gif?v=1.2twice (clearing the cache in between) results in these headers:>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:34 GMT << Expires: Tue, 21 Aug 2018 00:19:34 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:47 GMT << Expires: Tue, 21 Aug 2018 00:19:47 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.comHere it’s clear the second response was not served by the proxy: the caching response headers say MISS, the Date and Expires values change, and tailing the stevesouders.com access log shows two hits.
Proxy administrators can change the configuration to support caching resources with a querystring, when the caching headers indicate that is appropriate. But the default configuration is what web developers should expect to encounter most frequently. Another interesting note about these tests: notice how the proxy downgrades the responses to HTTP/1.0. This is going to alter browser behavior in terms of the number of connections that are opened. When I’m doing performance analysis I make sure to avoid being connected through a proxy.
This is the hack that John Grubber used to test whether iPhone 2.x had snuck in SquirrelFish. He was curious due to the performance improvements that he witnessed:

What about iPhone limits though? David Golightly tests the limits on the iPhone with a script that keeps downloading tiles until it can no longer do so:
After downloading about 210 images, the iPhone simply stops downloading new ones. This is probably due to hitting the hard 30MB same-page resource limit.